Detalles del proyecto
Description
PROJECT SUMMARY RNA-Sequencing (RNA-Seq) analysis provides a critical means to understand gene functions. High-throughput RNA-Seq data are frequently measured under multiple conditions from the same set of samples. For example, in the NIH Common Fund?s Genotype-Tissue Expression (GTEx) project, samples from different tissues are collected from each post-mortem donor for sequencing. For another study on ultraviolet (UV) radiation, skin keratinocytes from the same set of subjects are exposed to different radiation doses and durations before sequencing. Such common-sample, multi-condition RNA-Seq data have information shared across both samples and conditions, and have the potential to provide key insights into gene functions. However, despite great endeavors to collect such data, there is a lack of analytical methods and computational tools to maximize their potential. Important tasks such as missing data imputation, functional gene module identification and association analysis remain unaddressed. In this proposal, we will build an innovative and powerful paradigm to analyze multi-condition RNA-Seq data and thus improve our understanding of gene functions. To leverage information across conditions, samples and genes simultaneously, we propose to model RNA-Seq data as multi-way tensor arrays. We will develop novel tensor methods and theory that are appropriate for read count data. In particular, our first aim is to extend tensor completion methods for block-wise missing RNA-Seq data imputation. By modeling unobserved samples as missing blocks in a tensor, we will aggregate information along different modes (subjects, conditions, genes) to impute missing values. The second aim develops flexible tensor co-clustering methods, which simultaneously cluster genes, samples and conditions, for co- expressed gene module identification. The third aim is to build new tensor response regression models to associate gene modules with genotype and covariates which will provide insights into genetic regulation such as expression quantitative trait loci (eQTL). Finally, in the fourth aim, we will develop scalable statistical software to implement the proposed methods and make them more broadly applicable. We will apply the methods to the GTEx multi-tissue data and UV multi-condition data, and gain novel insights into gene expression and regulation. The proposed research will likely transform how we analyze multi-condition RNA- Seq data and enhance our understanding of human genomics and its relation to public health.
Estado | Finalizado |
---|---|
Fecha de inicio/Fecha fin | 3/1/21 → 2/28/22 |
Financiación
- National Human Genome Research Institute: $224,675.00
Keywords
- Genética
Huella digital
Explore los temas de investigación que se abordan en este proyecto. Estas etiquetas se generan con base en las adjudicaciones/concesiones subyacentes. Juntos, forma una huella digital única.