Building the Next Generation of Virtual Personal Assistants with First Person (Egocentric) Vision: From Visual Intelligence to AI and Future Predictions

Tutorial at ICIAP 2021 - 21st International Conference on Image Analysis and Processing

May 24 2022

Tutorial Description

Abstract

Wearable devices equipped with a camera and computing abilities are attracting the attention of both the market and the society, with commercial devices more and more available and many companies announcing the upcoming release of new devices. The main appeal of wearable devices is due to their mobility and to their ability to enable user-machine interaction through Augmented Reality. Due to these characteristics, wearable devices provide an ideal platform to develop intelligent assistants able to assist humans and augment their abilities, for which Artificial Intelligence and Computer Vision play a major role. Differently from classic computer vision (the so called “third person vision”), which analyses images collected from a static point of view, first person (egocentric) vision assume that images are collected from the point of view of the user, which gives privileged information on the user’s activities and the way they perceive and interact with the world. Indeed, the visual data acquired with wearable cameras usually provides useful information about the users, their intentions, and how they interact with the world. This tutorial will discuss the challenges and opportunities offered by first person (egocentric) vision, covering the historical background and seminal works, presenting the main technological tools and building blocks, and discussing applications.

Keywords

wearable, first person vision, egocentric vision, visual localization, action recognition, action anticipation, object interaction detection

Aims and learning objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to analyze the user’s behavior, build personalized applications and predict future events. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform localization, indexing, object detection, action recognition, and the prediction of future events.

Program

[14.00 - 15.30] Part I: Definitions, motivations, history and research trends - Antonino Furnari [Slides]

What is first person vision? What is it for?
What makes it different from third person vision?
History of First Person Vision: visions, ideas, research, devices;
Where do we go from here? Research trends, datasets and challenges.

[15.30 – 16.00] Coffe Break

[16.00 - 18.00] Part II: Building Blocks for First Person Vision Systems – Francesco Ragusa [Slides]

Data Acquisition & Datasets;
Fundamental Task in First Person Vision:
- Localization;
- Object Detection and Recognition;
- Egocentric Human-Object Interaction;
- Action/Activities;
- Anticipation.
Real Application Examples developed at Next Vision;
Conclusion.