Multi-Trait Analysis in Large-Scale Biobank Datasets Linked to Electronic Health Records.

Lee, Hyunkyu H (PI)

Columbia University Irving Medical Center

Proyecto

Description

PROJECT SUMMARY/ABSTRACT: Traditional genomic analyses like Genome-Wide Association Studies (GWAS) and Polygenic Scores (PGS) have greatly enhanced our knowledge of diseases, yet they also have certain drawbacks. Their use of binary traits is less desirable as this ignores the phenotypic heterogeneity, resulting in lost information and potential barriers to the progress of personalized medicine. Furthermore, case- control ratio imbalance reduces the effectiveness of GWAS for binary traits. Although using logistic regression on selection-biased samples can boost statistical power, it is suboptimal for risk prediction as the interpretation of observed effect sizes, such as odds ratios, may not accurately represent the target population. There is yet to be a multi-trait model that effectively addresses all these issues. My doctoral and postdoctoral work experience has laid the foundation for the Liability Threshold-based Phenotypic Integration (LTPI) model. Designed to enhance disease association mapping and risk prediction accuracy, this model leverages comprehensive individual disease histories captured by Electronic Health Records (EHRs), revealing shared genetic factors between target and non-target traits. Our initial research using UK Biobank and eMERGE data demonstrated that incorporating phenotypic information from both target and non-target phenotypes significantly improves the accuracy of disease risk prediction. However, a GWAS analysis using the LTPI score as a dependent variable identified several false positives. This outcome prompted questions about the LTPI model's assumption of homogeneity in genetic covariance estimates across the genome, a common assumption in conventional multi-trait methods. In Aim 1, I plan to develop a multi-trait model to maximize statistical power and control false positives. Based on a liability threshold model for binary and continuous phenotypes in EHR, the proposed model will consider genetic and non-genetic factors and use important sampling methods to manage estimation bottlenecks. For Aim 2, I propose a GWAS framework built on the model from Aim 1 that will utilize locus-specific risk scores. I aim to explore and validate the optimality of model parameters, including locus lengths, covariance matrix estimates, and trait selection. Ultimately, my effort will lead to a successful GWAS on Chronic Kidney Disease and Anxiety disorder. In Aim 3, I will undertake a large-scale analysis of multiple traits, mapping each disease with related non-target traits per locus, assessing disease risk, and observing individual disease susceptibility patterns. This analysis will draw on massive population cohort data from the eMERGE network, the UK Biobank, the Million Health Discovery Program, and All of Us to deliver meaningful scientific contributions.

Estado	Activo
Fecha de inicio/Fecha fin	9/20/24 → 8/31/25

Keywords

Informática aplicada a la salud

Acceder al proyecto

https://reporter.nih.gov/project-details/10866025

Multi-Trait Analysis in Large-Scale Biobank Datasets Linked to Electronic Health Records.

Detalles del proyecto

Description

Keywords

Acceder al proyecto

Huella digital