Embodied Vision-and-Language Navigation - A Universal View
COM2 Level 4
Executive Classroom, COM2-04-02

Abstract:
Embodied Vision-and-Language Navigation (VLN) lies at the heart of enabling robots to understand and act on natural language in the physical world. In this talk, I present a universal view of VLN as a foundational problem that unifies perception, language, memory, and action, bridging the long-standing gap between simulation and real-world autonomy. I analyse four core Sim2Real challenges—domain shift, dynamic environments, physical restrictions, and computational efficiency—and show how my recent work on large language models, lifelong scene adaptation, and efficient embodied policies addresses these barriers. Through real-world robot deployments and a Real→Sim→Real learning paradigm, I illustrate how VLN is evolving from a benchmark task into a key building block for scalable, reliable, and human-centred embodied AI.
Biography:
Dr. Qi Wu is an Associate Professor at the University of Adelaide and was a recipient of the Australian Research Council (ARC) Discovery Early Career Researcher Award (DECRA) from 2019 to 2021. He currently serves as the Director of Vision and Language at the Australian Institute for Machine Learning (AIML). In 2019, he received the J G Russell Award from the Australian Academy of Science. Dr. Wu earned his PhD in Computer Science from the University of Bath in 2015, where he also completed his MSc in 2011. His research interests lie primarily in computer vision and machine learning, with a particular focus on vision-language interaction. He has deep expertise in image captioning and visual question answering (VQA), and is one of the pioneers of the Vision-and-Language Navigation (VLN) task. He has published over 200 papers in top-tier journals and conferences such as TPAMI, CVPR, ICCV, and ECCV. He also serves as an Area Chair for leading conferences including CVPR, ICCV, NeurIPS, and ICML.

