Summary
Keywords
Full Transcript
November 22, 2024 Xiaolong Wang, UC San Diego Having a humanoid robot operating like a human has been a long-standing goal in robotics. The humanoid robot provides a general-purpose platform to conduct diverse tasks we do in our daily lives. In this talk, I will present a 2-level learning framework designed to equip humanoid robots with robust mobility and manipulation skills, enabling them to generalize across diverse tasks, objects, and environments. The first level focuses on training Vision-Language-Action (VLA) models with human video data for both navigation and manipulation. These models can predict “mid-level” actions which predict precise movements or trajectories for the human body and hands, conditioned on language instructions. The second level involves developing low-level robot manipulation skills through teleoperation, and low-level humanoid whole-body control skills via motion imitation and Sim2Real. By combining human VLA with low-level robot skills, this framework offers a scalable pathway toward realizing general-purpose humanoid robots. About the speaker: https://xiaolonw.github.io/ More about the course can be found here: https://stanfordasl.github.io/robotics_seminar/ View the entire AA289 Stanford Robotics and Autonomous Systems Seminar playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rMeercb-kvGLUrOq4HR6BZD ► Check out the entire catalog of courses and programs available through Stanford Online: https://online.stanford.edu/explore
