دورية أكاديمية

RESCRIPt: Reproducible sequence taxonomy reference database management.

التفاصيل البيبلوغرافية
العنوان: RESCRIPt: Reproducible sequence taxonomy reference database management.
المؤلفون: Robeson MS 2nd; University of Arkansas for Medical Sciences, Department of Biomedical Informatics, Little Rock, Arkansas, United States of America., O'Rourke DR; Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America., Kaehler BD; School of Science, University of New South Wales, Canberra, Australia., Ziemski M; Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zürich, Switzerland., Dillon MR; Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America., Foster JT; Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, United States of America., Bokulich NA; Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zürich, Switzerland.
المصدر: PLoS computational biology [PLoS Comput Biol] 2021 Nov 08; Vol. 17 (11), pp. e1009581. Date of Electronic Publication: 2021 Nov 08 (Print Publication: 2021).
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Public Library of Science Country of Publication: United States NLM ID: 101238922 Publication Model: eCollection Cited Medium: Internet ISSN: 1553-7358 (Electronic) Linking ISSN: 1553734X NLM ISO Abbreviation: PLoS Comput Biol Subsets: MEDLINE
أسماء مطبوعة: Original Publication: San Francisco, CA : Public Library of Science, [2005]-
مواضيع طبية MeSH: Database Management Systems* , Software*, Databases, Genetic/*statistics & numerical data, Animals ; Classification ; Computational Biology ; DNA Barcoding, Taxonomic ; Databases, Nucleic Acid ; Genomics ; Humans ; Metagenome ; Metagenomics ; Microbiota/genetics ; Phylogeny ; RNA, Ribosomal, 16S/genetics ; Sequence Analysis
مستخلص: Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.
Competing Interests: The authors declare that they have no competing interests.
References: Sci Transl Med. 2016 Jun 15;8(343):343ra82. (PMID: 27306664)
J Immunol Methods. 2015 Jun;421:112-121. (PMID: 25891793)
Mol Ecol. 2017 Nov;26(21):5872-5895. (PMID: 28921802)
PLoS One. 2015 Nov 12;10(11):e0142409. (PMID: 26562019)
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45. (PMID: 26553804)
PeerJ. 2018 Jun 12;6:e5030. (PMID: 29910992)
Genome. 2020 Sep;63(9):459-468. (PMID: 32531173)
Database (Oxford). 2020 Jan 1;2020:. (PMID: 32016319)
PLoS Biol. 2015 Jul 07;13(7):e1002195. (PMID: 26151137)
Nucleic Acids Res. 2016 Jun 20;44(11):5022-33. (PMID: 27166378)
Mol Ecol. 2012 Apr;21(8):1789-93. (PMID: 22486819)
mBio. 2018 Jun 5;9(3):. (PMID: 29871915)
Curr Microbiol. 2020 Jun;77(6):1135-1138. (PMID: 32006104)
Nucleic Acids Res. 2012 Jan;40(Database issue):D13-25. (PMID: 22140104)
Nature. 2012 Jun 13;486(7402):207-14. (PMID: 22699609)
Appl Environ Microbiol. 2000 Aug;66(8):3376-80. (PMID: 10919794)
Mol Biol Evol. 2018 Jun 1;35(6):1553-1555. (PMID: 29668970)
BMC Genomics. 2012;13 Suppl 8:S17. (PMID: 23282177)
Ecol Evol. 2017 Nov 26;8(1):185-196. (PMID: 29321862)
Nature. 2017 Nov 23;551(7681):457-463. (PMID: 29088705)
Sci Rep. 2018 Mar 9;8(1):4226. (PMID: 29523803)
Ecol Evol. 2019 Jan 10;9(3):1410-1430. (PMID: 30805170)
Database (Oxford). 2020 Jan 1;2020:. (PMID: 32761142)
BMC Genomics. 2019 Jun 15;20(1):496. (PMID: 31202277)
BMC Genomics. 2019 Jul 8;20(1):560. (PMID: 31286860)
Mycologia. 2016 Jan-Feb;108(1):1-5. (PMID: 26553774)
IMA Fungus. 2015 Jun;6(1):199-205. (PMID: 26203423)
Mol Biol Evol. 2013 Apr;30(4):772-80. (PMID: 23329690)
Mol Brain. 2020 Feb 21;13(1):24. (PMID: 32079532)
Nat Biotechnol. 2018 Nov;36(10):996-1004. (PMID: 30148503)
mBio. 2020 Jan 14;11(1):. (PMID: 31937639)
Bioinformatics. 2018 Nov 1;34(21):3753-3754. (PMID: 29878054)
PLoS One. 2013 Nov 27;8(11):e80278. (PMID: 24312207)
Cell Host Microbe. 2017 Aug 9;22(2):142-155. (PMID: 28799900)
Sci Data. 2018 Aug 07;5:180156. (PMID: 30084847)
BMC Genomics. 2020 Feb 27;21(1):184. (PMID: 32106809)
Arch Virol. 2019 Sep;164(9):2417-2429. (PMID: 31187277)
Bioinformatics. 2020 Apr 1;36(7):2314-2315. (PMID: 31778148)
Antonie Van Leeuwenhoek. 2014 Jul;106(1):43-56. (PMID: 24306768)
Gigascience. 2018 Jul 1;7(7):. (PMID: 29961842)
Environ Microbiol. 2010 Jul;12(7):1889-98. (PMID: 20236171)
Science. 2008 Oct 24;322(5901):537-8. (PMID: 18948528)
Nat Biotechnol. 2011 May;29(5):415-20. (PMID: 21552244)
ISME J. 2017 Dec;11(12):2864-2868. (PMID: 28742071)
BMC Bioinformatics. 2017 Jul 12;18(1):337. (PMID: 28701218)
FEMS Microbiol Ecol. 2011 Dec;78(3):617-28. (PMID: 22066608)
Sci Data. 2016 Mar 15;3:160018. (PMID: 26978244)
Nucleic Acids Res. 2007;35(3):e14. (PMID: 17169982)
Appl Environ Microbiol. 2006 Jul;72(7):5069-72. (PMID: 16820507)
PeerJ. 2018 May 28;6:e4925. (PMID: 29868296)
Int J Syst Evol Microbiol. 2018 Jul;68(7):2386-2392. (PMID: 29792589)
Nucleic Acids Res. 2013 May 1;41(10):5175-88. (PMID: 23571758)
ISME J. 2012 Mar;6(3):610-8. (PMID: 22134646)
Gigascience. 2013 Nov 26;2(1):16. (PMID: 24280061)
Int J Syst Evol Microbiol. 2018 Jul;68(7):2125-2129. (PMID: 29873629)
Ecol Evol. 2020 Jul 23;10(18):9721-9739. (PMID: 33005342)
Nat Methods. 2020 Mar;17(3):261-272. (PMID: 32015543)
Syst Appl Microbiol. 2010 Jun;33(4):175-82. (PMID: 20409658)
BMC Genomics. 2017 Mar 14;18(Suppl 2):114. (PMID: 28361695)
Proc Biol Sci. 2003 Feb 7;270(1512):313-21. (PMID: 12614582)
Nucleic Acids Res. 2007;35(21):7188-96. (PMID: 17947321)
Gigascience. 2018 May 1;7(5):. (PMID: 29762668)
Genes (Basel). 2020 Aug 03;11(8):. (PMID: 32756341)
Proc Natl Acad Sci U S A. 2011 Mar 15;108 Suppl 1:4516-22. (PMID: 20534432)
Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. (PMID: 23193283)
Nat Rev Microbiol. 2014 Sep;12(9):635-45. (PMID: 25118885)
PeerJ. 2018 Apr 18;6:e4652. (PMID: 29682424)
Sci Rep. 2016 May 13;6:25904. (PMID: 27174456)
Mol Ecol Resour. 2019 Jan;19(1):176-190. (PMID: 30281913)
PeerJ. 2015 Dec 08;3:e1487. (PMID: 26664811)
Genomics Inform. 2018 Dec;16(4):e24. (PMID: 30602085)
Nat Commun. 2019 Nov 6;10(1):5029. (PMID: 31695033)
Proc Natl Acad Sci U S A. 2015 Jun 30;112(26):8019-24. (PMID: 26034267)
PeerJ. 2018 Jul 10;6:e5248. (PMID: 30018864)
Proc Natl Acad Sci U S A. 1977 Nov;74(11):5088-90. (PMID: 270744)
Int J Syst Evol Microbiol. 2019 Jan;69(1A):S1-S111. (PMID: 26596770)
Nat Commun. 2019 Oct 24;10(1):4841. (PMID: 31649246)
Comput Struct Biotechnol J. 2020 Dec 03;18:4048-4062. (PMID: 33363701)
Appl Environ Microbiol. 2013 Apr;79(8):2519-26. (PMID: 23377949)
Proc Natl Acad Sci U S A. 2012 Apr 17;109(16):6241-6. (PMID: 22454494)
Mol Ecol. 2016 Feb;25(4):929-42. (PMID: 26479867)
PeerJ. 2016 Oct 18;4:e2584. (PMID: 27781170)
mSphere. 2018 Sep 5;3(5):. (PMID: 30185512)
Nat Commun. 2019 Oct 11;10(1):4643. (PMID: 31604942)
Mol Ecol Resour. 2015 Nov;15(6):1403-14. (PMID: 25732605)
New Phytol. 2005 Jun;166(3):1063-8. (PMID: 15869663)
Microbiome. 2020 May 15;8(1):65. (PMID: 32414415)
Microbiome. 2018 May 17;6(1):90. (PMID: 29773078)
Mol Ecol Notes. 2007 May 1;7(3):355-364. (PMID: 18784790)
Nucleic Acids Res. 2017 Jan 4;45(D1):D482-D490. (PMID: 27899678)
J Open Res Softw. 2018;3(30):. (PMID: 31552137)
ISME J. 2017 Nov;11(11):2399-2406. (PMID: 28731467)
Nucleic Acids Res. 2009 Jan;37(Database issue):D141-5. (PMID: 19004872)
Mol Mar Biol Biotechnol. 1994 Oct;3(5):294-9. (PMID: 7881515)
Science. 2014 Nov 28;346(6213):1256688. (PMID: 25430773)
mSystems. 2018 Nov 20;3(6):. (PMID: 30505944)
Genome Biol. 2016 Jun 20;17(1):132. (PMID: 27323842)
Nature. 2019 Apr;568(7753):505-510. (PMID: 30867587)
Brief Bioinform. 2019 Jul 19;20(4):1125-1136. (PMID: 29028872)
Proc Natl Acad Sci U S A. 2019 Nov 5;116(45):22651-22656. (PMID: 31636175)
Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2567-72. (PMID: 15701695)
Commun Biol. 2021 Jan 26;4(1):117. (PMID: 33500552)
Proc Natl Acad Sci U S A. 2014 Jan 7;111(1):E139-48. (PMID: 24277822)
Nucleic Acids Res. 2018 Jan 4;46(D1):D41-D47. (PMID: 29140468)
Nucleic Acids Res. 2018 Jan 4;46(D1):D8-D13. (PMID: 29140470)
Nat Biotechnol. 2017 Aug 8;35(8):725-731. (PMID: 28787424)
Nat Biotechnol. 2019 Aug;37(8):852-857. (PMID: 31341288)
المشرفين على المادة: 0 (RNA, Ribosomal, 16S)
تواريخ الأحداث: Date Created: 20211108 Date Completed: 20211217 Latest Revision: 20211217
رمز التحديث: 20240628
مُعرف محوري في PubMed: PMC8601625
DOI: 10.1371/journal.pcbi.1009581
PMID: 34748542
قاعدة البيانات: MEDLINE
الوصف
تدمد:1553-7358
DOI:10.1371/journal.pcbi.1009581