دورية أكاديمية

Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences.

التفاصيل البيبلوغرافية
العنوان: Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences.
المؤلفون: Ziemski M; Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zürich, Zurich, Switzerland., Wisanwanichthan T; School of Science, University of New South Wales, Canberra, ACT, Australia., Bokulich NA; Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zürich, Zurich, Switzerland., Kaehler BD; School of Science, University of New South Wales, Canberra, ACT, Australia.
المصدر: Frontiers in microbiology [Front Microbiol] 2021 Jun 18; Vol. 12, pp. 644487. Date of Electronic Publication: 2021 Jun 18 (Print Publication: 2021).
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Frontiers Research Foundation Country of Publication: Switzerland NLM ID: 101548977 Publication Model: eCollection Cited Medium: Print ISSN: 1664-302X (Print) Linking ISSN: 1664302X NLM ISO Abbreviation: Front Microbiol Subsets: PubMed not MEDLINE
أسماء مطبوعة: Original Publication: Lausanne : Frontiers Research Foundation
مستخلص: Naive Bayes classifiers (NBC) have dominated the field of taxonomic classification of amplicon sequences for over a decade. Apart from having runtime requirements that allow them to be trained and used on modest laptops, they have persistently provided class-topping classification accuracy. In this work we compare NBC with random forest classifiers, neural network classifiers, and a perfect classifier that can only fail when different species have identical sequences, and find that in some practical scenarios there is little scope for improving on NBC for taxonomic classification of 16S rRNA gene sequences. Further improvements in taxonomy classification are unlikely to come from novel algorithms alone, and will need to leverage other technological innovations, such as ecological frequency information.
Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
(Copyright © 2021 Ziemski, Wisanwanichthan, Bokulich and Kaehler.)
References: Nucleic Acids Res. 2016 Jun 20;44(11):5022-33. (PMID: 27166378)
PLoS Comput Biol. 2021 Sep 22;17(9):e1009345. (PMID: 34550967)
Nat Biotechnol. 2018 Nov;36(10):996-1004. (PMID: 30148503)
Nature. 2012 Jun 13;486(7402):215-21. (PMID: 22699610)
Antonie Van Leeuwenhoek. 2014 Jul;106(1):43-56. (PMID: 24306768)
BMC Genomics. 2020 Jan 2;21(1):6. (PMID: 31898477)
PeerJ. 2019 Jan 4;7:e6160. (PMID: 30631651)
Appl Environ Microbiol. 2007 Aug;73(16):5261-7. (PMID: 17586664)
Gigascience. 2018 May 1;7(5):. (PMID: 29762668)
Nat Rev Microbiol. 2014 Sep;12(9):635-45. (PMID: 25118885)
Nat Methods. 2018 Oct;15(10):796-798. (PMID: 30275573)
Nat Commun. 2019 Nov 6;10(1):5029. (PMID: 31695033)
Comput Struct Biotechnol J. 2020 Dec 03;18:4048-4062. (PMID: 33363701)
Microbiome. 2020 Aug 28;8(1):124. (PMID: 32859275)
PLoS Comput Biol. 2021 Nov 8;17(11):e1009581. (PMID: 34748542)
Nucleic Acids Res. 2008 Oct;36(18):e120. (PMID: 18723574)
BMC Bioinformatics. 2018 Jul 9;19(Suppl 7):198. (PMID: 30066629)
mSystems. 2020 Jul 28;5(4):. (PMID: 32723792)
Nat Biotechnol. 2019 Aug;37(8):852-857. (PMID: 31341288)
mBio. 2016 Jun 14;7(3):. (PMID: 27302757)
Nature. 2017 Nov 23;551(7681):457-463. (PMID: 29088705)
ISME J. 2012 Mar;6(3):610-8. (PMID: 22134646)
Microbiome. 2018 Oct 18;6(1):185. (PMID: 30336775)
PLoS One. 2015 Feb 03;10(2):e0116106. (PMID: 25646627)
Nat Commun. 2019 Oct 11;10(1):4643. (PMID: 31604942)
Microbiome. 2018 May 17;6(1):90. (PMID: 29773078)
J Open Res Softw. 2018;3(30):. (PMID: 31552137)
فهرسة مساهمة: Keywords: machine learning; marker-gene sequencing; metagenomics; microbiome; neural networks; taxonomic classification
تواريخ الأحداث: Date Created: 20210705 Latest Revision: 20240402
رمز التحديث: 20240402
مُعرف محوري في PubMed: PMC8249850
DOI: 10.3389/fmicb.2021.644487
PMID: 34220738
قاعدة البيانات: MEDLINE
الوصف
تدمد:1664-302X
DOI:10.3389/fmicb.2021.644487