Project Details
Description
Technological innovations have provided a primary force in advancement of scientific research and in social progress. High-throughput data of unprecedented size and complexity are frequently seen in diverse fields of science and humanity, ranging from computational biology and health studies to financial engineering and risk management. Such high-dimensional data have initiated many important problems in contemporary statistics where feature selection plays pivotal roles. The proposed project has the following three interrelated objectives in the theme of high-dimensional data with applications in classification and variable selection. (1) To introduce a nonparametric classification framework for high-dimensional data. The target of this research is to integrate the nonparametric component to the classical parametric methods for classification (e.g., penalized logistic regression, linear discriminant analysis) under high-dimensional settings without incurring much computational burden. Asymptotic properties are investigated regarding the excess risk. (2) To investigate the asymptotic properties of cross-validation for tuning parameter selection in high-dimensional variable selection. The goal here is to perform a systematic study on the asymptotic behavior of major cross-validation methods for choosing the tuning parameter when various penalty functions (LASSO, SCAD, MCP, etc.) are used. By delineating the properties of the classical cross-validation, a new modified cross-validation method for the purpose of choosing the optimal tuning parameter in the solution path is developed that achieves model selection consistency. (3) To introduce the notion of asymptotic stability for maximum penalized likelihood estimators. Despite the extensive literature on the maximum penalized likelihood estimators in high-dimensional settings, the research on the stability of the estimators has been very limited. The investigators aim to introduce the notion of asymptotic stability for a general class of maximum penalized likelihood estimators, study the behavior and evaluate the performance of these estimators when different penalty functions are applied.
The analysis of 'big data' now pervasive across many scientific disciplines poses challenges as well as opportunities to the field of statistics. A major goal of this proposal is to make methodological and theoretical contributions to the important and challenging topic of high-dimensional classification and variable selection. The proposed research will have broad impacts on many disciplines of science, including health/life sciences, economics, finance, astronomy and sociology, among others. In these fields, variable selection, feature extraction, sparsity explorations are crucial for knowledge discovery. The investigators have been interacting with researchers at New York State Psychiatric Institute at the Columbia University Medical Center, Computational Biology Center of the Memorial Sloan-Kettering Cancer Center and Center for Computational Learning Systems at Columbia University. The results of the proposed investigations will be used for understanding mental health issues, for identifying risk factors in diseases of cancer and for predicting failures in complex engineering systems. On the educational side, the proposed work will be incorporated into new courses on the state-of-the-art high-dimensional statistical learning. It will also be integrated into the training of undergraduate and graduate students, especially of under-represented groups, in terms of Ph.D. dissertations and undergraduate research projects.
Status | Finished |
---|---|
Effective start/end date | 7/1/13 → 6/30/16 |
Funding
- National Science Foundation: US$129,980.00
ASJC Scopus Subject Areas
- Statistics and Probability
- Mathematics(all)