CS SEMINAR

Self-supervised models for navigation and depth estimation

Speaker
Dr. Karteek Alahari, Inria Grenoble - Rhone-Alpes Center
Chaired by
Dr Angela YAO Yingjie, Dean's Chair Assistant Professor, School of Computing
ayao@comp.nus.edu.sg

13 Dec 2022 Tuesday, 10:30 AM to 11:30 AM

SR2, COM1-02-04

Abstract:
In this talk, we present models trained in a self-supervised fashion for two tasks. Our first work addresses the problem of navigating to a location indicated by a target image in a previously unseen environment. Earlier attempts, including RL-based and SLAM-based approaches, have either shown poor generalization performance, or are heavily-reliant on pose/depth sensors. We present a novel method that leverages a cross-episode memory to learn to navigate. Here, we train a state-embedding network in a self-supervised fashion, and then use it to embed previously-visited states into a memory.

In our second work, we propose a new alternative of densely estimating metric depth by combining a monocular camera with a light-weight LiDAR, e.g., with 4 beams, typical of today's automotive-grade mass-produced laser scanners. Our framework, called LiDARTouch, estimates dense depth maps from monocular images with the help of "touches'' of LiDAR, i.e., without the need for dense ground-truth depth. In our setup, the minimal LiDAR input contributes on three different levels: as an additional model's input, in a self-supervised LiDAR reconstruction objective function, and to estimate changes of pose.


Biodata:
Karteek Alahari is a tenured researcher (charge de recherche Inria) in the Thoth research team (formerly known as LEAR), based at the Inria Grenoble - Rhone-Alpes center. He was previously a postdoctoral fellow in the Inria WILLOW team at the Department of Computer Science, Ecole Normale Superieure (ENS), working with Ivan Laptev, Jean Ponce, and Josef Sivic. He completed my Ph.D. in July 2010, under the supervision of Philip Torr. He is also an associate member of the Visual Geometry Group, University of Oxford and the WILLOW team at ENS.

Karteek's current research focuses on addressing the visual understanding problem in the context of large-scale datasets. In particular, he works on learning robust and effective visual representations, when only partially-supervised data is available. This includes frameworks such as incremental learning, weakly-supervised learning, adversarial training, etc