Multivariate Distribution-Free Nonparametric Testing Using Optimal Transportation award

  • Sen, Bodhisattva (PI)

Project: Research project

Project Details

Description

Nonparametric methods have become increasingly popular in the theory and practice of statistics in recent times primarily because of their fundamental advantages over parametric methods: greater flexibility and more 'data-driven'' features. In this project the Principal Investigator (PI) analyzes the estimation, computation and uncertainty quantification in two important areas of nonparametric statistics. Special emphasis is given to methods applicable to multivariate data, an area that has received relatively less attention, though often necessary in performing real data analyses. The methods developed will be tuning-free, computationally feasible, and well-defined under minimal assumptions on the underlying models. On the collaborative front, the PI will continue interdisciplinary research in astronomy and in particular, some of the methodology discussed in this project will address important scientific questions arising from astronomy and will form the dissertation theses of two PhD students at Columbia. The PI will also continue the tradition of mentoring summer interns. The graduate student support will be used on interdisciplinary research and writing codes.

The PI investigates two core directions of research in nonparametric statistics with special emphasis to multivariate data. The main trust of this project is to study multivariate distribution-free nonparametric rank-methods based that generalize the classical univariate rank-based procedures to multivariate data. This new general framework crucially uses ideas from the theory of optimal transport – an important and very active research area in applied mathematics/probability/machine learning. The second part of the project is on multivariate nonparametric (heteroscedastic) mixture models and is directly motivated by astronomy collaborations involving the PI. Statistical inference of stellar populations of interest is complicated by significant observational limitations – in particular, by heteroscedastic measurement errors. Indeed, almost all data sets in astronomy contain known (heteroscedastic) error measurements on every observation. This naturally leads to data that can be modeled as nonparametric (heteroscedastic) mixture models. The PI, along with his collaborators, will investigate several aspects of this problem: (a) estimation of the data distributions, (b) denoising the observations, and (c) studying the associated deconvolution and manifold learning problems. For both the above problems, a systematic theoretical study of the methods will be undertaken.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusFinished
Effective start/end date7/1/206/30/23

Funding

  • National Science Foundation: US$300,000.00

ASJC Scopus Subject Areas

  • Statistics, Probability and Uncertainty
  • Statistics and Probability
  • Mathematics(all)

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.