Randomized optimization algorithms for large-scale data systems

  • Stein, Clifford C. (PI)

Projet

Détails sur le projet

Description

The scale of modern datasets nowadays often requires that the computation is done in more restricted or structured computational set,tings than on a standard serial computer. Even when using a standard computer to process large datasets, we can usually only afford,a runtime that is not much larger than the dataset size. To manage to solve such large scale problems efficiently, we must undertake, new paradigms - such as randomized or (small) approximation algorithms - for which we need to develop new set of algorithmic tools., Our focus is on developing algorithms in such modern computational settings, including in the massive parallel models (capturing re,al-world systems such as MapReduce), and the streaming model. The algorithms will solve critical problems of systems that use artifi,cial intelligence and machine learning to solve problems in a variety of domains, including areas such as climate science, imaging a,nd sensor networks.We will study multiple algorithmic questions in modern computational settings. One major such setting is the mas,sive parallel computing model (MPC), which models cluster computing systems such as the influential MapReduce or many systems that f,ollowed, where a computational task is performed collaboratively by a number of machines. Another such setting is the streaming comp,uting model: where the,mall sketch (synopsis) of it. Even when the problem is to be solved in a standard single-processor model, it is often only feasible,to solve the problem in time as close as possible to linear time - a goal which often requires tools closely related to those for th,e aforementioned settings.Methodologically, a unifying theme in the above settings is that more efficient algorithms are possible vi,a use of efficient data representations, such as dimension-reduced vectors, sketches, or graph spanners/emulators. Such efficient da,ta representations can been conceptualized as 'functional' compression of data: compression that preserves certain attributes only,,those crucial for the computational task at hand. Such representations allow for more efficient handling of objects: for example, mo,st directly, it takes less time and space to handle a smaller-dimensional vector, though many applications are more involved.The obj,ective of this project is to develop more efficient algorithms for classic problems in the models described above, using and co-deve,loping efficient data representations and novel algorithmic techniques. We will design and implement algorithms for basic graph prob,lems such as shortest paths, matchings and flows, optimization algorithms such as linear programming and gradient descent, and algor,ithms that are the building blocks of modern machine learning systems. In particular, we focus on three research directions: general, optimization, ML optimization problems, and graph problems on high-dimensional datasets. Generic problems such as linear programmin,g are widely used in scientific applications. Hence, our first goal is to develop new MPC (and streaming) algorithms for such genera,l optimization problems. Furthermore, many specialized optimization problems arise in Machine Learning. While they can be solved by,general solvers, oftentimes they admit more efficient algorithms due to further structure. The next second and third research direct,ions focus on such specialized optimization problems that are core to machine learning systems. These two directions roughly corresp,will also work on bridging the gap between theory and practice of our developed tools, via implementation and testing. Approved for,Public Release

StatutTerminé
Date de début/de fin réelle9/1/229/1/22

Keywords

  • Inteligencia artificial
  • Ciencias sociales (todo)

Empreinte numérique

Explorer les sujets de recherche abordés dans ce projet. Ces étiquettes sont créées en fonction des prix/bourses sous-jacents. Ensemble, ils forment une empreinte numérique unique.