Project Details
Description
PROJECT SUMMARY/ABSTRACT: Traditional genomic analyses like Genome-Wide Association Studies
(GWAS) and Polygenic Scores (PGS) have greatly enhanced our knowledge of diseases, yet they also have
certain drawbacks. Their use of binary traits is less desirable as this ignores the phenotypic heterogeneity,
resulting in lost information and potential barriers to the progress of personalized medicine. Furthermore, case-
control ratio imbalance reduces the effectiveness of GWAS for binary traits. Although using logistic regression
on selection-biased samples can boost statistical power, it is suboptimal for risk prediction as the interpretation
of observed effect sizes, such as odds ratios, may not accurately represent the target population. There is yet to
be a multi-trait model that effectively addresses all these issues.
My doctoral and postdoctoral work experience has laid the foundation for the Liability Threshold-based
Phenotypic Integration (LTPI) model. Designed to enhance disease association mapping and risk prediction
accuracy, this model leverages comprehensive individual disease histories captured by Electronic Health
Records (EHRs), revealing shared genetic factors between target and non-target traits. Our initial research using
UK Biobank and eMERGE data demonstrated that incorporating phenotypic information from both target and
non-target phenotypes significantly improves the accuracy of disease risk prediction. However, a GWAS analysis
using the LTPI score as a dependent variable identified several false positives. This outcome prompted questions
about the LTPI model's assumption of homogeneity in genetic covariance estimates across the genome, a
common assumption in conventional multi-trait methods.
In Aim 1, I plan to develop a multi-trait model to maximize statistical power and control false positives. Based on
a liability threshold model for binary and continuous phenotypes in EHR, the proposed model will consider
genetic and non-genetic factors and use important sampling methods to manage estimation bottlenecks. For
Aim 2, I propose a GWAS framework built on the model from Aim 1 that will utilize locus-specific risk scores. I
aim to explore and validate the optimality of model parameters, including locus lengths, covariance matrix
estimates, and trait selection. Ultimately, my effort will lead to a successful GWAS on Chronic Kidney Disease
and Anxiety disorder. In Aim 3, I will undertake a large-scale analysis of multiple traits, mapping each disease
with related non-target traits per locus, assessing disease risk, and observing individual disease susceptibility
patterns. This analysis will draw on massive population cohort data from the eMERGE network, the UK Biobank,
the Million Health Discovery Program, and All of Us to deliver meaningful scientific contributions.
Status | Active |
---|---|
Effective start/end date | 9/20/24 → 8/31/25 |
ASJC Scopus Subject Areas
- Health Informatics
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.