Latent Dependence and Identifiable, Graphical, Deep Modeling of Discrete Latent Variables

Gu, Yuqi (PI)

Columbia University

Proyecto

Description

In the data science era, complex dependent and heterogeneous data emerge in various subject areas, from education to psychology to medicine. Latent variable models are powerful statistical approaches to tackle such complex data. However, existing statistical methods for analysis of latent variables are mostly limited to relatively simple settings and cannot meet the need for modern high dimensional applications. For example, one critical motivating example for this project is personalized learning, for which educators aim to diagnose individual students’ latent strengths and weaknesses across many skills based on educational assessment data. In this scenario, it is highly desirable to make discrete statistical diagnoses about student’s fine-grained skills, to understand the relationships between various latent skills and the underlying cognitive processes, and to develop targeted remedial instructions. To achieve these goals, this project aims to develop a suite of new statistical tools for discrete latent variable modeling. The new statistical methodology is intended to apply not only to educational data, but also to data from psychology, medicine, genetics, and health sciences. The tools will be implemented in publicly available software. These research tools are expected to help practitioners to uncover hidden information about students, patients, and biological systems in a statistically principled manner. In addition, this project will provide multiple training opportunities for graduate and undergraduate students, introducing them to the important area of latent variable models in modern statistics. This project aims to advance the statistical theory and methodology of discrete latent variable modeling and providing novel statistical algorithms applicable to education and other applications. The project has three objectives. The first is to develop new mathematical machinery to study identifiability in general discrete models with latent and graphical components. These techniques will be used to derive sharp identifiability conditions for models motivated by education sciences. The second objective is to elaborate two new families of generative models with discrete latent variables: deep generative models with multilayer latent structures, and probabilistic graphical models encoding hard hierarchical latent constraints. Identifiability of these models will be established, which would guarantee the validity of statistical inference. The resulting models are expected to shed light on latent dependencies in several applications, particularly, in conjunction with educational diagnoses and personalized learning. The third objective is to develop novel hypothesis testing of identifiability, flexible Bayesian methods to simultaneously infer latent dimensions and other parameters, and efficient structure learning procedures to estimate the latent graphical constraints. The project will offer opportunities for professional development of trainees at the interface of statistics, data science, psychology, and educational sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Estado	Activo
Fecha de inicio/Fecha fin	9/1/22 → 8/31/25

Financiación

National Science Foundation

Keywords

Estadística y probabilidad
Matemáticas (todo)
Física y astronomía (todo)

Acceder al proyecto

https://www.nsf.gov/awardsearch/showAward?AWD_ID=2210796

Latent Dependence and Identifiable, Graphical, Deep Modeling of Discrete Latent Variables

Detalles del proyecto

Description

Financiación

Keywords

Acceder al proyecto

Huella digital