CAREER: Unifying short and long read RNA-seq analysis of alternative splicing using network flow models

  • Knowles, David (PI)

Proyecto

Detalles del proyecto

Description

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).

Alternative splicing (AS), i.e. variation in the regions of A precursor messenger RNA removed, is a crucial cellular process in higher organisms. It has been found to be key for cellular responses to external stimuli from immune response in mammals to climate adaptation in plants, and it has been implicated in common diseases, from autoimmune (e.g. multiple sclerosis) to neurodegenerative (e.g., Alzheimer's disease). Despite its importance, the tools to study AS have been limited by technological (short read) and conceptual problems. The former problem is now being addressed by the increasing availability and affordability of long read sequencing. However, the latter problem remains unaddressed. Long reads have been used to identify which alternatively spliced isoforms are present in cells, but not to statistically identify changes in AS or the specific splicing events involved. This research project is an attempt to fundamentally transform how AS is analyzed, into a unified framework making joint use of all available data (both long and short reads). Software tools will be developed under the FAIR (Findable, Accessible, Interoperable, Reusable) criteria including open source sharing of code and package distribution. An annual three-day codeathon based at New York Genome Center (NYGC) specifically targeting women and underrepresented minorities (recruited from Barnard College and Hunter College respectively) will be organized.

This project proposes a new conceptual framework for AS analysis that 1) unifies local and isoform-level quantification, 2) accounts for uncertainty to enable powerful statistical testing, 3) enables exploratory data analysis of large-scale datasets, and 4) can jointly leverage short and/or long read RNA-seq data. This framework will use network flow algorithms to connect local splicing events with isoform usage rates, develop well-calibrated statistical tests, and convex dimensionality reduction techniques accounting for varying noise levels. These tools will be developed in the context of impactful real world applications (neurodevelopmental gene regulation, neurodegenerative disease progression, and spliceosomal mutant cancer) in close collaboration with colleagues at the NYGC, the non-profit research institute where the PI is co-affiliated. Results from the project will be posted at https://daklab.github.io/the_splice_must_flow/.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

EstadoActivo
Fecha de inicio/Fecha fin3/1/222/28/27

Financiación

  • National Science Foundation: $393,791.00

Keywords

  • Neurología clínica
  • Bioquímica, genética y biología molecular (todo)

Huella digital

Explore los temas de investigación que se abordan en este proyecto. Estas etiquetas se generan con base en las adjudicaciones/concesiones subyacentes. Juntos, forma una huella digital única.