Differential Expression, Functional and Machine Learning Analysis of High-Throughput –Omics Data Using Open-Source Tools

Moritz Kebschull; Annika Therese Kroeger; Panos N. Papapanou

doi:10.1007/978-1-0716-2780-8_19

Differential Expression, Functional and Machine Learning Analysis of High-Throughput –Omics Data Using Open-Source Tools

Moritz Kebschull, Annika Therese Kroeger, Panos N. Papapanou

College of Dental Medicine

Résultat de recherche

Résumé

Today, –omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier “candidate” gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized –omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease. A major issue when inferring biological information from high-throughput –omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences. Furthermore, machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups. Herein, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of –omics data using open-source tools. We outline a differential expression analysis pipeline that can be used for data from both arrays and sequencing experiments, and offers the possibility to account for random or fixed effects. Furthermore, we present an overview of the possibilities for a functional analysis of the obtained data including subsequent machine learning approaches in form of (i) supervised classification algorithms in class validation and (ii) unsupervised clustering in class discovery.

Langue d'origine	English
Titre de la publication principale	Methods in Molecular Biology
Maison d'édition	Humana Press Inc.
Pages	317-351
Nombre de pages	35
DOI	https://doi.org/10.1007/978-1-0716-2780-8_19
Statut de publication	Published - 2023

Séries de publication

Prénom	Methods in Molecular Biology
Volume	2588
ISSN (imprimé)	1064-3745
ISSN (électronique)	1940-6029

Financement

Bailleurs de fonds	Numéro du bailleur de fonds
Colgate-Palmolive Inc.
German Society for Oral and Maxillo-Facial Sciences
German Society for Periodontology
National Institutes of Health
National Institute of Dental and Craniofacial Research	DE021820, DE024735, DE015649
Deutsche Gesellschaft für Zahn-, Mund- und Kieferheilkunde
Chinese Medical Association

ASJC Scopus Subject Areas

Genetics
Molecular Biology

Accès au document

10.1007/978-1-0716-2780-8_19

Autres fichiers et liens

Citer

Kebschull, M., Kroeger, A. T., & Papapanou, P. N. (2023). Differential Expression, Functional and Machine Learning Analysis of High-Throughput –Omics Data Using Open-Source Tools. Dans Methods in Molecular Biology (pp. 317-351). (Methods in Molecular Biology; Vol. 2588). Humana Press Inc.. https://doi.org/10.1007/978-1-0716-2780-8_19

@inbook{ee4b71f7915044bd936540e19e4eb5bf,

title = "Differential Expression, Functional and Machine Learning Analysis of High-Throughput –Omics Data Using Open-Source Tools",

abstract = "Today, –omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier “candidate” gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized –omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease. A major issue when inferring biological information from high-throughput –omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences. Furthermore, machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups. Herein, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of –omics data using open-source tools. We outline a differential expression analysis pipeline that can be used for data from both arrays and sequencing experiments, and offers the possibility to account for random or fixed effects. Furthermore, we present an overview of the possibilities for a functional analysis of the obtained data including subsequent machine learning approaches in form of (i) supervised classification algorithms in class validation and (ii) unsupervised clustering in class discovery.",

author = "Moritz Kebschull and Kroeger, {Annika Therese} and Papapanou, {Panos N.}",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.",

year = "2023",

doi = "10.1007/978-1-0716-2780-8_19",

language = "English",

series = "Methods in Molecular Biology",

publisher = "Humana Press Inc.",

pages = "317--351",

booktitle = "Methods in Molecular Biology",

}

TY - CHAP

T1 - Differential Expression, Functional and Machine Learning Analysis of High-Throughput –Omics Data Using Open-Source Tools

AU - Kebschull, Moritz

AU - Kroeger, Annika Therese

AU - Papapanou, Panos N.

PY - 2023

Y1 - 2023

N2 - Today, –omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier “candidate” gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized –omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease. A major issue when inferring biological information from high-throughput –omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences. Furthermore, machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups. Herein, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of –omics data using open-source tools. We outline a differential expression analysis pipeline that can be used for data from both arrays and sequencing experiments, and offers the possibility to account for random or fixed effects. Furthermore, we present an overview of the possibilities for a functional analysis of the obtained data including subsequent machine learning approaches in form of (i) supervised classification algorithms in class validation and (ii) unsupervised clustering in class discovery.

AB - Today, –omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier “candidate” gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized –omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease. A major issue when inferring biological information from high-throughput –omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences. Furthermore, machine learning methods facilitate the detection of additional patterns, beyond the mere identification of lists of features that differ between groups. Herein, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of –omics data using open-source tools. We outline a differential expression analysis pipeline that can be used for data from both arrays and sequencing experiments, and offers the possibility to account for random or fixed effects. Furthermore, we present an overview of the possibilities for a functional analysis of the obtained data including subsequent machine learning approaches in form of (i) supervised classification algorithms in class validation and (ii) unsupervised clustering in class discovery.

UR - http://www.scopus.com/inward/record.url?scp=85142720162&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85142720162&partnerID=8YFLogxK

U2 - 10.1007/978-1-0716-2780-8_19

DO - 10.1007/978-1-0716-2780-8_19

M3 - Chapter

C2 - 36418696

AN - SCOPUS:85142720162

T3 - Methods in Molecular Biology

SP - 317

EP - 351

BT - Methods in Molecular Biology

PB - Humana Press Inc.

ER -

Differential Expression, Functional and Machine Learning Analysis of High-Throughput –Omics Data Using Open-Source Tools

Résumé

Séries de publication

Financement

ASJC Scopus Subject Areas

Accès au document

Autres fichiers et liens

Empreinte numérique

Citer