Détails sur le projet
Description
This research program introduces a computer vision framework that produces explainable, correctable,and extendable visual representations for image recognition. We force the model to utilize objectattributes to predict a category label. Given an image, our framework produces a human-readablejustification that is certified to explain the internal reasoning of the model. The core of our projectis avisual dictionary of the objects, activities, and events in the visual world. We introduce a new method forconstructing this visual dictionary on a large scale without requiring manual supervision, capitalizing oncurrent large natural language models. Each thrust will share this representational interface, driving deepintegration throughout the project. Instead of a bag of separate attributes utilized in prior approaches wepropose a method for capturing both relations between attributes and between attributes and classes, aswell as sub-classes. This should lead to improvements both in generalizing to new instances of trainingclasses and to novel classes based on new combinations of attributes, as in few-shot learning. Finally,prior work has failed to demonstrate clear advantages to extracting attributes, and representing objectswith them. Here we propose an automated self-improvement loop for the model, where challenge datasetsare constructed, model errors are analyzed, and new attributes are added to the model, thereby improvingmodel performance.
Statut | Actif |
---|---|
Date de début/de fin réelle | 5/1/23 → … |
Keywords
- Visión artificial y reconocimiento de patrones
- Ciencias sociales (todo)