Collaborative Research: Halfspace Depth for Object and Functional Data

Project: Research project

Project Details

Description

Complex data objects are increasingly being generated across science and engineering. Non-Euclidean data such as wind directions, neural connectivity networks, and phylogenetic trees draw practical interest, but are challenging to analyze due to their intrinsic constraints. Functional data such as trajectories and images also provide examples of another type of data of high complexity, which are observed on a continuous domain in time or space. In general, practitioners are interested in first exploring the data distributions before any modeling analysis. For instance, given a sample of growth trajectories of children, a first step is to identify typical versus extreme growth patterns, where the latter can be non-trivial to uncover. Also, when analyzing brain connectivity matrices, it is important to find unusual brain networks and differences between healthy and diseased populations. Data-driven methods robust to anomalies are essential in these settings since little is known about the data generating process, and outliers can affect the analysis. Due to the lack of a natural ordering in data objects, exploratory tools such as boxplot and quantile are unavailable for these types of data. The project will address the lack of techniques for exploring non-Euclidean and functional data. Principled statistics and visualization methods will be developed based on a novel way of ranking the observations. The project will also provide training for graduate and undergraduate students.

The central research theme is to develop exploratory data analysis tools for non-Euclidean and functional data objects. To overcome the absence of a canonical ordering for object data, the PIs will develop suitable data depth notions to quantify the centrality of data points with respect to the distribution. This will provide a center-outward ranking of the data that will be used as a building block for outlier detection methods, rank tests, and robust classifiers. Analogous to Tukey's halfspace depth for the multivariate Euclidean case, the new depth notions for object data are expected to be intuitive and robust, and have desirable properties well-grounded in theory. Specifically, the research project will investigate a depth notion for non-Euclidean objects; a data visualization and an outlier detection procedure for non-Euclidean data; halfspace depth notions for functional data, one based on theory and another one from an algorithmic perspective; and a depth notion for sparsely observed longitudinal data. Key challenges that will be addressed include a lack of vector space structure when dealing with non-Euclidean objects; the infinite dimensionality and degeneracy when defining depth notions for functional data; detecting outlying trajectories and images in shape and not just at any time point; and the sparsity and irregularity of observations in longitudinal data. Method and theory development will draw from metric geometry, functional data analysis, empirical process, and M-estimation. Software implementing a suite of depth-based methods will be made available to the public as an outcome of the project.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusFinished
Effective start/end date7/1/216/30/24

Funding

  • National Science Foundation: US$174,999.00

ASJC Scopus Subject Areas

  • Statistics and Probability
  • Mathematics(all)

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.