Robot Learning 2025: Foundational Models for Robotics and Scaling DeepRL - RobotLearning: Scaling Deep Q-Learning Part1

4.0 (3)

32 learners

What you'll learn

This course includes

34.5 hours of video
Certificate of completion
Access on mobile and TV

Summary

Keywords

robotics machine learning deep learning foundational models deep Q-learning

Full Transcript

In this lecture segment, I explained the progression from simple bandits to Q-learning, outlining the challenges and solutions in reinforcement learning. I began by discussing multi-armed bandits, emphasizing the exploration-exploitation dilemma and introducing methods like epsilon-greedy and upper confidence bound (UCB) to balance these competing needs. I then moved to contextual bandits, which incorporate state information, and finally to Q-learning, which learns a state-dependent policy. I highlighted the advantages of Q-learning over policy gradients, such as its ability to learn from off-policy data and its lower variance. I delved into the concept of approximate dynamic programming, explaining how value and policy iteration methods, like value iteration and policy iteration, can be used to train a Q-function. I discussed the computational cost of these methods, particularly the need to perform an argmax over all possible actions, and how policy iteration can reduce this cost by bootstrapping on previous policies. I concluded by hinting at the possibility of combining policy evaluation and improvement into a single step for further efficiency.

Continue this lesson in the app

Install CourseHive on Android or iOS to keep learning while you move.

Related Courses

Welcome

Robot Learning 2025: Foundational Models for Robotics and Scaling DeepRL - RobotLearning: Scaling Deep Q-Learning Part1

What you'll learn

This course includes

Summary

Keywords

Full Transcript

Continue this lesson in the app

Related Courses

Lecture Collection | Introduction to Robotics

Introduction to Robotics | IIT Madras

Robotics

Robotics by Prof. D K Pratihar

FAQs