Collaborative Research: Learning and forecasting high-dimensional extremes: sparsity, causality, privacy

  • Davis, Richard A. (PI)
  • Avella Medina, Marco (CoPI)

Project: Research project

Project Details

Description

The principal goal of this research project is to learn how to forecast future extreme observations and to assess their impact. On an almost daily occurrence, the public is inundated with news accounts related to extreme observations arising from extraordinary climatic events from extended and severe droughts to extraordinary precipitation records, to record heat waves that have reached virtually every region of the US in one form or another. These extreme events appear unexpectedly, can be dangerous and occur in combinations that may or may not be coincidental. Do tropical storms become more deadly as global temperatures rise? Does extreme violence become more widespread as the economic conditions worsen? Questions of this type are studied by climate scientists and social scientists respectively, but statistical and probabilistic analysis of extreme values is an indispensable ingredient in any analysis. Modern statistical analysis of extremes is both blessed by the deluge of the amount of available data and cursed by this deluge. The available data are often high dimensional and contaminated. The necessity of quick forecast of future extremes and corresponding policy updates require online analysis of extremes. This research aims to evaluate causal impacts of various factors from a potentially large array of variables including changing environmental conditions, demographic movements within the US, changing landscapes, and changing economic conditions, on the frequency and magnitude of extreme events. From many variables, the hope is to produce methodology to extract the important features in the data that have a direct impact on describing and predicting extremes. This research also revolves around the notion of differential privacy and aims to develop tools for releasing global characteristics of a data set without revealing individual level information. The focus of this research will be related to developing differential privacy procedures that are tailored to extreme value characteristics of large data sets, which is challenging because extreme observations are precisely the ones that reveal the most individual information. An overarching objective of this research project is to adapt modern statistical learning tools to the problem of forecasting extremes. Learning the structure of extremes presents difficult challenges due to both a limited number of extreme data and to the scarcity of extremal labels. One approach is to develop methods for detecting nonlinear sets of much smaller dimension that can provide an adequate description of extremes in high dimensions. A main thrust of this research is to develop powerful modern learning techniques (such as graph-based learning methods and kernel principal component analysis) that allow one to determine the extremal support from the data. A second main thrust of this research centers on the issue of causality in both small and large dimensional problems. In the most basic form, a set of variables X is said to be tail causal to a dependent vector Y if certain changes in X (sometimes themselves extreme but not always so) impact the tail behavior of Y. The potential outcomes framework for causality of extreme events will be a major focus in this proposal’s research agenda. A third main thrust of this research is about differential privacy in the context of extremes, which provides tools for releasing global characteristics of a data set without revealing individual level information. This is achieved by modifying the data before releasing it and, in particular, randomizing it, in such a way that the output of the procedure does not depend too much on any specific observation while still allowing for statistical inference for certain characteristics of the original data set.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date8/15/237/31/26

ASJC Scopus Subject Areas

  • Statistics and Probability
  • Mathematics(all)
  • Physics and Astronomy(all)

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.