CRII: RI: Learning Predictive Representations from Unlabeled Video

Vondrick, Carl (PI)

Columbia University

Projet

Description

The project studies computer systems that predict how objects and people will move, even when they are out-of-sight due to occlusion, for example keys inside pockets. Predictive models have the potential to enable many new applications impacting health, security, and robotics, which can improve the efficiency, safety, and welfare of the overall population. To achieve this, the research investigates computer vision algorithms that learn the visual patterns for prediction automatically from large amounts of video data. This computer software will be able to track objects obscured by occlusion, accurately represent shadows in video, and forecast object movements into the future. The project will provide research opportunities for both graduate and undergraduate students, and increase the diversity in machine intelligence research. Outcomes from this project will translate into course material to teach students in computer science and machine learning.

This research focuses on robustly generalizing predictive models to the natural diversity and complexity of real-world video. While large annotated datasets fuel rapid advancements in visual scene recognition, machine understanding of events and dynamics remains challenging because the amount of knowledge required for video understanding is vast and potentially ambiguous. Instead, the investigators aim to capitalize on large amounts of raw, unlabeled video in order to create machine algorithms that efficiently learn to predict the future behaviors of events, objects, and people. Building off highly competitive frameworks from the research team and others, this project will leverage natural redundancy in unlabeled video, such as color coherency and repetitive motion, to train deep convolutional neural networks without human supervision. The research team proposes extensions to spatiotemporal memory models to handle such situations, and methods to learn representations of color constancy that will improve tracking performance. The investigators also propose analysis tools to measure and visualize the representation that emerges, enabling new methods to quantify performance in predictive models.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Statut	Terminé
Date de début/de fin réelle	6/15/19 → 5/31/22

Financement

National Science Foundation: 175 000,00 $ US

Keywords

Inteligencia artificial
Informática (todo)

CRII: RI: Learning Predictive Representations from Unlabeled Video

Détails sur le projet

Description

Financement

Keywords

Empreinte numérique