When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous

Xi Sun, Xinshuo Weng, Kris Kitani

Robotics Institute, Carnegie Mellon University

International Conference on Intelligent Robots and Systems (IROS), 2020

Oral Presentation

One-Sentence Summary

We proposed the first method and collected a dataset for a new task namely visual-inertial person localization (VIPL), which aims to localize the target through an inertial sensor without access to the appearance information of the target in advance.

Demo Video (15 minute oral presentation at IROS 2020)

Demo Video (4 minute short presentation)


We aim to enable robots to visually localize a target person through the aid of an additional sensing modality -- the target person's 3D inertial measurements. The need for such technology may arise when a robot is to meet a person in a crowd for the first time or when an autonomous vehicle must rendezvous with a rider amongst a crowd without knowing the appearance of the person in advance. A person's inertial information can be measured with a wearable device such as a smart-phone and can be shared selectively with an autonomous system during the rendezvous. We propose a method to learn a visual-inertial feature space in which the motion of a person in video can be easily matched to the motion measured by a wearable inertial measurement unit (IMU). The transformation of the two modalities into the joint feature space is learned through the use of a triplet loss which forces inertial motion features and video motion features generated by the same person to lie close in the joint feature space. To validate our approach, we compose a dataset of over 3,000 video segments of moving people along with wearable IMU data. We show that our method is able to localize a target person with 80.7% accuracy averaged over testing data with various number of candidates using only 5 seconds of IMU data and video.



author = {Sun, Xi and Weng, Xinshuo and Kitani, Kris}, 
journal = {IROS}, 
title = {{When We First Met: Visual-Inertial Person Localization for Co-Robot Rendezvous}}, 
year = {2020} 

Page Views since 08/18/2020