Collaborative Research: RI: Medium: Programmatic Foundation Models for Visual Analysis on a Planetary Scale

  • Vondrick, Carl (PI)

Project: Research project

Project Details

Description

Imagery around the world—from satellites to drones and social media photographs—provide vital information about our planet. There is a unique opportunity in the fields of artificial intelligence and computer vision to understand global and local phenomena from these images, providing insight about climate change, public health, and agriculture. However, the state-of-the-art methods in computer vision are not designed for these applications where decision-making is complex, and accuracy, robustness, and interpretability are required. Existing large-scale AI models, such as ChatGPT, only process individual images on the internet and cannot synthesize conclusions from planet-scale image collections. Even on single images, these models cannot reliably perform sophisticated logical reasoning, and building models to do such reasoning reliably requires unfeasibly large datasets. Creating such large models and datasets is a significant barrier for scientific and societal applications of computer vision, particularly for organizations that do not have the computational resources of large corporations. This project will create a new class of machine learning models, called programmatic foundation models, that have the capability and efficiency to scale to planetary-scale image and video datasets. These models can be queried by experts using natural language, thus empowering scientists and experts to benefit from AI related visual discovery from the vast amounts of visual information available in satellite imagery even if they lack expertise in machine learning. The proposed research has applications across public health, climate change, agriculture, security, and the economy. The research objective of this project is to tightly integrate visual representations and program synthesis together, thereby delivering an accurate, interpretable, and robust machine learning framework for answering questions about what is visible in image collections. Across two research thrusts, the project will drive the creation of these new programmatic foundation models. The first thrust proposes new techniques for building open-world recognition primitives across multiple sensing modalities based on vision-language models, but without any language annotations. It introduces new cross-modal contrastive learning techniques, as well as approaches for reasoning about temporal change. The second thrust proposes new techniques to learn to synthesize programs, incorporating uncertainty, learning from feedback and adaptive computation. Given a query, our proposed framework learns to synthesize a customized program that breaks the task down into constituent steps and control flow that can be directly executed for solving the vision task. To execute each step, the project proposes new methods for training open-world classification, detection and segmentation models for satellite, aerial, and ground imagery. Unlike prior foundation models, this integrated approach has many potential benefits in interpretability, logical soundness, modularity, compositionality, efficiency, and generality to different tasks. The two thrusts taken together combine program synthesis with open-world recognition models for analyzing satellite, drone, and ground imagery around the world.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date8/15/247/31/28

ASJC Scopus Subject Areas

  • Artificial Intelligence
  • Space and Planetary Science
  • Computer Networks and Communications
  • Engineering(all)
  • Computer Science(all)