#86 – David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning

David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar, and MuZero and lot of important work in reinforcement
#86 – David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning

David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar, and MuZero and lot of important work in reinforcement learning.

Support this podcast by signing up with these sponsors:

– MasterClass: https://masterclass.com/lex (https://masterclass.com/lex)

– Cash App – use code “LexPodcast” and download:

– Cash App (App Store): https://apple.co/2sPrUHe (https://apple.co/2sPrUHe)

– Cash App (Google Play): https://bit.ly/2MlvP5w (https://bit.ly/2MlvP5w)

EPISODE LINKS:

Reinforcement learning (book): https://amzn.to/2Jwp5zG

This conversation is part of the Artificial Intelligence podcast. If you would like to get more information about this podcast go to https://lexfridman.com/ai (https://lexfridman.com/ai) or connect with @lexfridman on Twitter (https://twitter.com/lexfridman), LinkedIn (https://www.linkedin.com/in/lexfridman/), Facebook (https://www.facebook.com/lexfridman), Medium (https://medium.com/@lexfridman), or YouTube (https://www.youtube.com/lexfridman) where you can watch the video versions of these conversations. If you enjoy the podcast, please rate it 5 stars on Apple Podcasts (https://podcasts.apple.com/us/podcast/artificial-intelligence/id1434243584), follow on Spotify (https://open.spotify.com/show/2MAi0BvDc6GTFvKFPXnkCL), or support it on Patreon (https://www.patreon.com/lexfridman).

Here’s the outline of the episode. On some podcast players you should be able to click the timestamp to jump to that time.

OUTLINE:

00:00 – Introduction

04:09 – First program

11:11 – AlphaGo

21:42 – Rule of the game of Go

25:37 – Reinforcement learning: personal journey

30:15 – What is reinforcement learning?

43:51 – AlphaGo (continued)

53:40 – Supervised learning and self play in AlphaGo

1:06:12 – Lee Sedol retirement from Go play

1:08:57 – Garry Kasparov

1:14:10 – Alpha Zero and self play

1:31:29 – Creativity in AlphaZero

1:35:21 – AlphaZero applications

1:37:59 – Reward functions

1:40:51 – Meaning of life

Write a comment
No comments yet.