Summary
Full Transcript
All of us have seen videos of these highly complex and humanoid-looking robots like Boston Dynamics who do crazy stunts, flips, and something that looks very impressive. This gives us an impression that the field of robotics is doing quite well. But ask yourself 2 questions: (1) Have you seen robots in your locality which can do daily chores? Why hasn't the field of robotics become more commercial and easy to use? (2) Why is the field only accessible to experts?: Why can't anyone sitting at their home develop an algorithm, implement it on a robot, and see how it works? [with the same ease with which we tweak LLMs] I was intrigued by these same questions, which led me to start a quest to understand how the field of robotics has evolved and where we are at currently. The future of robotics has always seemed bright, but it has never been bright enough. First, we got excited by reinforcement learning, then by behavioral cloning, but we still do not have general-purpose robots which are very effective. However, something has changed in the last three years. We have made incredible progress in the field of AI largely because of the development of LLMs. So the natural question is this: Can LLMs be used as a catalyst to advance the field of robotics as well? The answer is yes. The last six to eight months has seen incredible progress in the development of visual language-action (VLA) models. These are foundational robotics models which are trained on datasets that are collected from multiple different robots through a huge collaboration between a large number of institutions (e.g: Open-X dataset). These models are developed on top of VLMs with an action layer added at the top of the architecture. Foundational models are at the heart of democratizing robotics research. Even sitting in their labs, they can use these models which have already been trained on a large number of skill sets and fine-tuned for their own application. Hugging Face has been doing an incredible job in this space (e.g: LeRobot). I believe that we are at the cusp of a paradigm shift in the field of robotics. In the next 3 to 5 years, we will see huge advancements in practical robotics products and research. To help people join this robotics movement and contribute meaningfully, we are launching a course called "Modern Robot Learning From Scratch". The course will be highly practical, and we will see five trajectories converging beautifully: RL Imitation Learning Diffusion models VLMs VLAs I am writing this post to inspire anyone who has the slightest of interest in the future of AI and robotics. I really think that the gap between hardware and software is merging. AI engineers who think they are bad at hardware stuff and people who are good with practical stuff but think AI as too complex, should sit down and rethink because this is an ideal field for everyone to contribute.
