Course Hive
Search

Welcome

Sign in or create your account

Continue with Google
or
RobotLearning: Scaling Deep Q-Learning Part1
Play lesson

Robot Learning 2025: Foundational Models for Robotics and Scaling DeepRL - RobotLearning: Scaling Deep Q-Learning Part1

4.0 (3)
32 learners

What you'll learn

This course includes

  • 34.5 hours of video
  • Certificate of completion
  • Access on mobile and TV

Summary

Keywords

Full Transcript

In this lecture segment, I explained the progression from simple bandits to Q-learning, outlining the challenges and solutions in reinforcement learning. I began by discussing multi-armed bandits, emphasizing the exploration-exploitation dilemma and introducing methods like epsilon-greedy and upper confidence bound (UCB) to balance these competing needs. I then moved to contextual bandits, which incorporate state information, and finally to Q-learning, which learns a state-dependent policy. I highlighted the advantages of Q-learning over policy gradients, such as its ability to learn from off-policy data and its lower variance. I delved into the concept of approximate dynamic programming, explaining how value and policy iteration methods, like value iteration and policy iteration, can be used to train a Q-function. I discussed the computational cost of these methods, particularly the need to perform an argmax over all possible actions, and how policy iteration can reduce this cost by bootstrapping on previous policies. I concluded by hinting at the possibility of combining policy evaluation and improvement into a single step for further efficiency.

Course Hive

Continue this lesson in the app

Install CourseHive on Android or iOS to keep learning while you move.

Related Courses

FAQs

Course Hive
Download CourseHive
Keep learning anywhere