Conveners
Robotics and Embodied AI II
- Shinjae Yoo (Brookhaven National Laboratory)
Like dominos, some of the greatest technical challenges of robotics have fallen one by one: physical safety (2000-2010), computer vision (2010-2015), legged locomotion (2015-2020), and even high-level, semantic intelligence and language processing (2020-2025), have all made leaps previously thought impossible. What’s standing between us and the general purpose robot of the future, deployed in...
In this talk, we discuss multimodal video models and their importance in robot learning. We first cover multimodal video-language models to capture semantic and motion information over videos. We then talk about how such video models could benefit vision-language-action (VLA) models for robot visuo-motor action policy. VLA models including LLaRA and LangToMo as well as applications of...