دورية أكاديمية

Supervised Machine Learning Enables Geospatial Microbial Provenance.

التفاصيل البيبلوغرافية
العنوان: Supervised Machine Learning Enables Geospatial Microbial Provenance.
المؤلفون: Bhattacharya C; Tri-Institutional Computational Biology & Medicine Program, Weill Cornell Medicine, New York, NY 10065, USA.; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA.; Integrated Design and Media, Center for Urban Science and Progress, NYU Tandon School of Engineering, Brooklyn, New York, NY 11201, USA., Tierney BT; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA.; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA., Ryon KA; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA.; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA., Bhattacharyya M; Center for Artificial Intelligence and Machine Learning, Indian Statistical Institute, Kolkata 700108, India.; Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India., Hastings JJA; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA.; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA., Basu S; Department of Medicine, Weill Cornell Medicine, New York, NY 10065, USA., Bhattacharya B; Department of Electrical and Electronics Engineering, Birla Institute of Technology, Mesra, Ranchi 835215, India., Bagchi D; Department of Metallurgy & Materials Engineering, Indian Institute of Engineering Science & Technology, Shibpur, Howrah 711103, India., Mukherjee S; Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore., Wang L; Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore., Henaff EM; Integrated Design and Media, Center for Urban Science and Progress, NYU Tandon School of Engineering, Brooklyn, New York, NY 11201, USA., Mason CE; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10065, USA.; Integrated Design and Media, Center for Urban Science and Progress, NYU Tandon School of Engineering, Brooklyn, New York, NY 11201, USA.; WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY 10065, USA.
المصدر: Genes [Genes (Basel)] 2022 Oct 21; Vol. 13 (10). Date of Electronic Publication: 2022 Oct 21.
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't; Research Support, N.I.H., Extramural
اللغة: English
بيانات الدورية: Publisher: MDPI Country of Publication: Switzerland NLM ID: 101551097 Publication Model: Electronic Cited Medium: Internet ISSN: 2073-4425 (Electronic) Linking ISSN: 20734425 NLM ISO Abbreviation: Genes (Basel) Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Basel : MDPI
مواضيع طبية MeSH: Metagenomics*/methods , Microbiota*/genetics, Metagenome ; Supervised Machine Learning ; Cities
مستخلص: The recent increase in publicly available metagenomic datasets with geospatial metadata has made it possible to determine location-specific, microbial fingerprints from around the world. Such fingerprints can be useful for comparing microbial niches for environmental research, as well as for applications within forensic science and public health. To determine the regional specificity for environmental metagenomes, we examined 4305 shotgun-sequenced samples from the MetaSUB Consortium dataset-the most extensive public collection of urban microbiomes, spanning 60 different cities, 30 countries, and 6 continents. We were able to identify city-specific microbial fingerprints using supervised machine learning (SML) on the taxonomic classifications, and we also compared the performance of ten SML classifiers. We then further evaluated the five algorithms with the highest accuracy, with the city and continental accuracy ranging from 85-89% to 90-94%, respectively. Thereafter, we used these results to develop Cassandra, a random-forest-based classifier that identifies bioindicator species to aid in fingerprinting and can infer higher-order microbial interactions at each site. We further tested the Cassandra algorithm on the Tara Oceans dataset, the largest collection of marine-based microbial genomes, where it classified the oceanic sample locations with 83% accuracy. These results and code show the utility of SML methods and Cassandra to identify bioindicator species across both oceanic and urban environments, which can help guide ongoing efforts in biotracing, environmental monitoring, and microbial forensics (MF).
References: Forensic Sci Int Genet. 2017 May;28:52-70. (PMID: 28171784)
PLoS Comput Biol. 2020 May 11;16(5):e1007895. (PMID: 32392251)
BMC Biol. 2014 Aug 22;12:69. (PMID: 25184604)
J Environ Health. 2013 Jan-Feb;75(6):120-1. (PMID: 23397659)
J Clin Microbiol. 2016 Aug;54(8):1964-74. (PMID: 26912746)
Genome Biol. 2011 Jun 24;12(6):R60. (PMID: 21702898)
Microbiome. 2016 Jun 03;4(1):24. (PMID: 27255532)
Mol Ecol Resour. 2018 Nov;18(6):1381-1391. (PMID: 30014577)
Precis Clin Med. 2020 Jun;3(2):136-146. (PMID: 32685241)
Elife. 2021 May 04;10:. (PMID: 33944776)
Nat Biotechnol. 2017 May 9;35(5):401-403. (PMID: 28486462)
Nat Med. 2020 Jun;26(6):941-951. (PMID: 32514171)
Sci Rep. 2022 Apr 19;12(1):6412. (PMID: 35440734)
Am J Physiol Gastrointest Liver Physiol. 2021 Aug 1;321(2):G232-G242. (PMID: 34133236)
Nucleic Acids Res. 2020 Jan 8;48(D1):D626-D632. (PMID: 31728526)
PLoS Genet. 2017 Sep 21;13(9):e1006960. (PMID: 28934201)
Forensic Sci Rev. 2020 Jan;32(1):23-54. (PMID: 32007927)
PLoS Comput Biol. 2016 Jul 11;12(7):e1004977. (PMID: 27400279)
Proc Natl Acad Sci U S A. 2015 Jun 2;112(22):E2930-8. (PMID: 25964341)
Proc Natl Acad Sci U S A. 2018 Feb 20;115(8):1690-1692. (PMID: 29440440)
Front Microbiol. 2021 Jan 13;11:608101. (PMID: 33519756)
Microbiome. 2017 Aug 14;5(1):101. (PMID: 28807044)
Lancet Microbe. 2021 Apr;2(4):e135-e136. (PMID: 33655229)
Carcinogenesis. 2021 Jun 21;42(6):842-852. (PMID: 33513602)
PLoS Biol. 2022 Mar 2;20(3):e3001556. (PMID: 35235560)
Trends Biotechnol. 2017 Sep;35(9):814-823. (PMID: 28366290)
Nat Commun. 2021 Mar 12;12(1):1660. (PMID: 33712587)
Nature. 2017 Oct 5;550(7674):61-66. (PMID: 28953883)
PLoS Comput Biol. 2015 Mar 16;11(3):e1004127. (PMID: 25774498)
Microbiome. 2021 May 20;9(1):114. (PMID: 34016161)
Genome Biol. 2017 Sep 21;18(1):182. (PMID: 28934964)
Microbiome. 2021 Apr 1;9(1):82. (PMID: 33795001)
PLoS One. 2007 Nov 07;2(11):e1124. (PMID: 17987112)
Nature. 2007 Oct 18;449(7164):804-10. (PMID: 17943116)
Microbiome. 2013 Apr 05;1(1):11. (PMID: 24456583)
J Forensic Sci. 2016 May;61(3):607-17. (PMID: 27122396)
Gut Microbes. 2022 Jan-Dec;14(1):2105609. (PMID: 35915556)
J Biomol Tech. 2017 Apr;28(1):31-39. (PMID: 28337070)
BMC Bioinformatics. 2018 Jul 17;19(1):270. (PMID: 30016950)
Sci Data. 2015 May 26;2:150023. (PMID: 26029378)
Bioinformatics. 2018 Aug 15;34(16):2870-2878. (PMID: 29608657)
Cell. 2021 Jun 24;184(13):3376-3393.e17. (PMID: 34043940)
Front Microbiol. 2016 Feb 24;7:225. (PMID: 26941736)
Genome Biol. 2019 Nov 28;20(1):257. (PMID: 31779668)
Trends Microbiol. 2019 May;27(5):387-397. (PMID: 30554770)
Forensic Sci Int Genet. 2019 Jan;38:195-203. (PMID: 30447564)
mSystems. 2016 Apr 19;1(2):. (PMID: 27822521)
Nat Biotechnol. 2019 Aug;37(8):852-857. (PMID: 31341288)
Nat Commun. 2021 May 18;12(1):2907. (PMID: 34006865)
معلومات مُعتمدة: U01 DA053941 United States DA NIDA NIH HHS; R01 AI125416 United States AI NIAID NIH HHS; R21 EB031466 United States EB NIBIB NIH HHS; R01 AI151059 United States AI NIAID NIH HHS; R21 AI129851 United States AI NIAID NIH HHS
فهرسة مساهمة: Keywords: bioindicator species; metagenomics; microbial fingerprint; microbial forensics; supervised machine learning
تواريخ الأحداث: Date Created: 20221027 Date Completed: 20221028 Latest Revision: 20240525
رمز التحديث: 20240525
مُعرف محوري في PubMed: PMC9601318
DOI: 10.3390/genes13101914
PMID: 36292799
قاعدة البيانات: MEDLINE
الوصف
تدمد:2073-4425
DOI:10.3390/genes13101914