Building Personal AIs with First Person (Egocentric) Vision

Tutorial at VISAPP 2019 - 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

February 25th 2019

[Slides] [Further Reading]

Tutorial Description

Abstract

The increasing availability of wearable devices capable of acquiring and processing images and video from the point of view of the user (e.g., Google Glass, Microsoft HoloLens and Magic Leap One) has promoted the interest of the computer vision community on first person (egocentric) vision. Being portable and allowing to mediate the reality as perceived by their users, such devices are ideal candidates for implementing personal intelligent assistants which can understand our behavior and augment our abilities. Unlike standard “third person vision”, which assumes that the processed images and video are acquired from a static point of view neutral to the perceived events, first person (egocentric) vision assumes images and video to be acquired from the rather non-static point of view of the user by means of a wearable device. These unique acquisition settings make first person (egocentric) vision different from standard third person vision. Most notably, the visual information collected using wearable cameras always “tells something” about the user, revealing what they do, what they pay attention to and how they interact with the world. Moreover, wearable devices allow to effortlessly collect huge quantities of user-centric visual data. In this tutorial, we will discuss the challenges and opportunities offered by first person (egocentric) vision, cover the historical background and seminal works, present the main technological tools (including devices and algorithms) which can be used to analyze first person visual data and discuss challenges and open problems.

Keywords

wearable, first person, egocentric, localization, action recognition, action anticipation

Aims and learning objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to understand the user’s behavior and build personalized applications. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, action and activity recognition.

Target audience

First year PhD students, graduate students, researchers.

Prerequisite Knowledge of Audience

Fundamentals of Computer Vision and Machine Learning (including Deep Learning)

Slides

Part 1 Part 2

Further Reading - References

First Person Vision @ IPLAB