Project Details
Description
My lab will use machine learning to build physically-grounded models of biomolecules and their interactions and
apply these models at proteome (genome) scale to address basic questions in the systems biology of human
signaling. On the modeling front, our efforts will focus on building computational models of protein-ligand
interactions, with a specific emphasis on post-translationally modified ligands that cells widely employ in signaling
networks. I hypothesize that a step change in accuracy and generality of protein-ligand interaction models is
possible using deep learning advances in protein structure prediction and protein representation learning. My
lab has been at the forefront of these advances, having developed the first end-to-end differentiable model of
protein structure prediction (RGN); the first protein language model (UniRep), a key technique for learning
mathematical representations that capture chemical, structural, and evolutionary properties of proteins; and one
of the first deep learning methods for protein-protein interactions (HSM). We will leverage our expertise in these
domains to predict protein-ligand interactions based on both sequence and structure information. We will further
develop specialized models for predicting protein structures and alternate protein conformations for the purpose
of predicting protein-ligand interaction, using these predictions as inputs for our protein-ligand interaction models.
On the biological front, we will employ these machine-learned models to assemble person-specific signaling
networks to understand how normal allelic variation is manifested at the level of signaling networks, and how
these networks are perturbed in human diseases. To study general variation in signaling networks, we will use
exome sequences (UK Biobank and NHLBI TOPMed) to build individualized networks that map person-specific
protein sequences to protein-ligand affinities. We will quantify how network topology varies among individuals
and populations and test whether disease-associated traits correlate with topology. We will also compare
networks of healthy and disease-afflicted persons to identify topological differences that predispose individuals
to genetic diseases. Ultimately, I expect machine-learned models to be sufficiently predictive of ligand binding
that mechanistic understanding of pathway rewiring by mutations is possible. While my focus will be
computational, I expect to carry out close collaborations—with the Fordyce Lab (Stanford) to experimentally
characterize and validate protein-ligand interactions and the Shen Lab (Columbia) to perform statistical genetic
analyses—to exploit synergies at the interface of computation and experimentation.
Status | Finished |
---|---|
Effective start/end date | 9/22/23 → 8/31/24 |
ASJC Scopus Subject Areas
- Artificial Intelligence
- Computational Mathematics
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.