دورية أكاديمية

Comparison of the Strengths and Weaknesses of Machine Learning Algorithms and Feature Selection on KEGG Database Microbial Gene Pathway Annotation and Its Effects on Reconstructed Network Topology.

التفاصيل البيبلوغرافية
العنوان: Comparison of the Strengths and Weaknesses of Machine Learning Algorithms and Feature Selection on KEGG Database Microbial Gene Pathway Annotation and Its Effects on Reconstructed Network Topology.
المؤلفون: Robben M; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA., Nasr MS; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA., Das A; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA., Veerla JP; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA., Huber M; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA., Jaworski J; Department of Bioengineering, and University of Texas at Arlington, Arlington, Texas, USA., Weidanz J; Department of Kinesiology, University of Texas at Arlington, Arlington, Texas, USA., Luber J; Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA.
المصدر: Journal of computational biology : a journal of computational molecular cell biology [J Comput Biol] 2023 Jul; Vol. 30 (7), pp. 766-782.
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, Non-P.H.S.
اللغة: English
بيانات الدورية: Publisher: Mary Ann Liebert, Inc Country of Publication: United States NLM ID: 9433358 Publication Model: Print Cited Medium: Internet ISSN: 1557-8666 (Electronic) Linking ISSN: 10665277 NLM ISO Abbreviation: J Comput Biol Subsets: MEDLINE
أسماء مطبوعة: Original Publication: New York, NY : Mary Ann Liebert, Inc., c1994-
مواضيع طبية MeSH: Algorithms* , Genes, Microbial*, Humans ; Molecular Sequence Annotation ; Neural Networks, Computer ; Machine Learning
مستخلص: The development of tools for the annotation of genes from newly sequenced species has not evolved much from homologous alignment to prior annotated species. While the quality of gene annotations continues to decline as we sequence and assemble more evolutionary distant gut microbiome species, machine learning presents a high quality alternative to traditional techniques. In this study, we investigate the relative performance of common classical and nonclassical machine learning algorithms in the problem of gene annotation using human microbiome-associated species genes from the KEGG database. The majority of the ensemble, clustering, and deep learning algorithms that we investigated showed higher prediction accuracy than CD-Hit in predicting partial KEGG function. Motif-based, machine-learning methods of annotation in new species were faster and had higher precision-recall than methods of homologous alignment or orthologous gene clustering. Gradient boosted ensemble methods and neural networks also predicted higher connectivity in reconstructed KEGG pathways, finding twice as many new pathway interactions than blast alignment. The use of motif-based, machine-learning algorithms in annotation software will allow researchers to develop powerful tools to interact with bacterial microbiomes in ways previously unachievable through homologous sequence alignment alone.
فهرسة مساهمة: Keywords: biological databases; functional annotation; machine learning; network biology
تواريخ الأحداث: Date Created: 20230712 Date Completed: 20230714 Latest Revision: 20230718
رمز التحديث: 20231215
DOI: 10.1089/cmb.2022.0370
PMID: 37437088
قاعدة البيانات: MEDLINE
الوصف
تدمد:1557-8666
DOI:10.1089/cmb.2022.0370