Conditional Quantile Random Forest with Biomedical and Biological Applications

Projet

Détails sur le projet

Description

Modern biology and biomedical science are experiencing a wave of machine learning applications as biological data sets become increasingly larger and more complex. Among them, random forest is particularly appealing and has gained great popularity in biology studies, genomic data analysis, and biomedical science. They offer great flexibility in modeling the complex data and associations, while still enjoy certain levels of interpretability and transparent decision mechanism. The project aims to develop a new framework of conditional quantile random forest (CQRF), which largely generalize the existing approaches. The project will investigate its potential in advancing biology and biomedical science with focused applications analyzing electronic medical records and genomic data. Once carried out, the research work potentially lead to new knowledge discoveries and new precision interventions in biomedical science. The project also provides research training opportunities for graduate students.

Handling complex and heterogenous associations is one of the major challenges to obtain meaningful inferences in big data applications. Machine learning methods including random forest have been successful in capturing the heterogeneity in the covariate space. In many applications, researchers have observed that the associations of interest could vary by the outcomes. The research builds on the conditional quantile regression in recursive sampling partitions and predictions, which further extends the flexibility to adjust for the heterogeneous association across the outcomes. The build-in conditional regression models also make the inference possible, which is another critical consideration for the medical applications of machine learning approaches but have not been well explored and understood. As a result, the conditional quantile random forest provides a more accurate predictions and risk assessment, enhances the statistical power for detecting biomarkers, and provides a new way for treatment selection. In the meantime, the developed framework is built on a quantile model with high-dimensional interactive function, which is new in the literature of quantile regression and could significantly enhance the capacity of quantile regression in large-scale applications.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatutTerminé
Date de début/de fin réelle8/15/207/31/23

Financement

  • National Science Foundation: 400 000,00 $ US

Keywords

  • Bioquímica, genética y biología molecular (todo)
  • Medicina (todo)
  • Matemáticas (todo)

Empreinte numérique

Explorer les sujets de recherche abordés dans ce projet. Ces étiquettes sont créées en fonction des prix/bourses sous-jacents. Ensemble, ils forment une empreinte numérique unique.