Project Details
Description
Many challenges facing statisticians today are related to making decisions based on and drawing conclusions from complex datasets that are quite different from datasets studied before the existence of modern computers. For instance, data collected in fields like genetics, astronomy, and finance are now often high-dimensional, unstructured, multimodal, and extremely large. Therefore, a core challenge facing statisticians in today's world is to develop and analyze methods for learning from data under the restrictions that the procedures remain computationally efficient in the face of such modern complexities and perform close to the theoretically optimal performance limits when enough data is available. In particular, with the increasing societal value of massive data collection and processing, we must build statistical estimation systems and procedures that are energy-efficient; hence, sustainable. This project will lead to increases in the computational efficiencies of algorithms, and conceptually, will be crucial to improve estimation quality from a reduced number of measurements in large-scale problems, while simultaneously creating new applications of such methods in wireless communications. Beyond the research activities, this project includes specific initiatives to develop the research arm of a program to identify, support, and help build the academic portfolios of undergraduate students in the New York City tri-state area who aspire to be researchers in statistics and data science and who are from historically underrepresented populations in these disciplines, as well as to formalize and expand undergraduate research opportunities for local students with an emphasis on training a new generation of statisticians and data scientists with interdisciplinary skill sets and research interests.This project will tackle challenges caused by modern, complex data by investigating the following questions: (A) Exactly how well do modern statistical procedures perform when datasets are growing rapidly? (B) Given a complex statistics or machine learning task, how much data, or information, is needed to solve it? How much data is needed if we impose computational constraints on algorithm efficiency? (C) How can recent advances in understanding high-dimensional statistics be used for engineering systems design? We will address three lines of inquiry related to these challenges. First, as a community, we have an incomplete understanding of how standard statistical estimation methods perform in high-dimensional settings, where the number of parameters grows with the number of data points. To address this, the proposed work will provide rigorous theoretical guarantees for estimation performance for large classes of penalized estimators for high-dimensional (generalized) linear models. Secondly, while we have a fairly complete picture of fundamental limits after which no algorithm will be able to successfully extract signal from noise in statistical estimation, detection, and inference, we have a much more limited understanding of such fundamental limits when constraints are placed on algorithm efficiency. Proposed work will establish such computational limits for various modern procedures under complex, but relevant, modeling assumptions that allow the problem structure to change as dimensions grow. Finally, the key challenge in wireless communication is to devise coding schemes for transmitting information reliably from a sender to a receiver through a noisy channel that are computationally efficient, have a low probability of decoding error, and allow for data rates close to the information-theoretically optimal value, the channel capacity. This project will design new algorithms for this kind of communication by leveraging ideas from high-dimensional statistical estimation procedures where we know efficient algorithms can perform close to information theoretic limits.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Status | Active |
---|---|
Effective start/end date | 9/1/24 → 8/31/26 |
ASJC Scopus Subject Areas
- Computer Science(all)
- Statistics and Probability
- Mathematics(all)
- Physics and Astronomy(all)