دورية أكاديمية

happi: a hierarchical approach to pangenomics inference.

التفاصيل البيبلوغرافية
العنوان: happi: a hierarchical approach to pangenomics inference.
المؤلفون: Trinh P; Department of Environmental & Occupational Health Sciences, University of Washington, Seattle, WA, USA., Clausen DS; Department of Biostatistics, University of Washington, Seattle, WA, USA., Willis AD; Department of Biostatistics, University of Washington, Seattle, WA, USA. adwillis@uw.edu.
المصدر: Genome biology [Genome Biol] 2023 Sep 29; Vol. 24 (1), pp. 214. Date of Electronic Publication: 2023 Sep 29.
نوع المنشور: Journal Article; Research Support, N.I.H., Extramural
اللغة: English
بيانات الدورية: Publisher: BioMed Central Ltd Country of Publication: England NLM ID: 100960660 Publication Model: Electronic Cited Medium: Internet ISSN: 1474-760X (Electronic) Linking ISSN: 14747596 NLM ISO Abbreviation: Genome Biol Subsets: MEDLINE
أسماء مطبوعة: Publication: London, UK : BioMed Central Ltd
Original Publication: London : Genome Biology Ltd., c2000-
مواضيع طبية MeSH: Metagenomics* , Microbiota*/genetics, Metagenome ; Computer Simulation ; Sequence Analysis, DNA
مستخلص: Recovering metagenome-assembled genomes (MAGs) from shotgun sequencing data is an increasingly common task in microbiome studies, as MAGs provide deeper insight into the functional potential of both culturable and non-culturable microorganisms. However, metagenome-assembled genomes vary in quality and may contain omissions and contamination. These errors present challenges for detecting genes and comparing gene enrichment across sample types. To address this, we propose happi, an approach to testing hypotheses about gene enrichment that accounts for genome quality. We illustrate the advantages of happi over existing approaches using published Saccharibacteria MAGs, Streptococcus thermophilus MAGs, and via simulation.
(© 2023. BioMed Central Ltd., part of Springer Nature.)
References: Pallen MJ, Wren BW. Bacterial pathogenomics. Nature. 2007;449(7164):835–42. (PMID: 10.1038/nature0624817943120)
Rouli L, Merhej V, Fournier PE, Raoult D. The bacterial pangenome as a new tool for analysing pathogenic bacteria. New Microbes New Infect. 2015;7:72–85. https://doi.org/10.1016/j.nmni.2015.06.005 . (PMID: 10.1016/j.nmni.2015.06.005264421494552756)
Sherman RM, Salzberg SL. Pan-genomics in the human genome era. Nat Rev Genet. 2020;21(4):243–54. https://doi.org/10.1038/s41576-020-0210-7 . (PMID: 10.1038/s41576-020-0210-7320343217752153)
Imperi F, Antunes LCS, Blom J, Villa L, Iacono M, Visca P, et al. The genomics of Acinetobacter baumannii: insights into genome plasticity, antimicrobial resistance and pathogenicity. IUBMB Life. 2011;63(12):1068–74. (PMID: 10.1002/iub.53122034231)
Van Rossum T, Ferretti P, Maistrenko OM, Bork P. Diversity within species: interpreting strains in microbiomes. Nat Rev Microbiol. 2020;18(9):491–506. https://doi.org/10.1038/s41579-020-0368-1 . (PMID: 10.1038/s41579-020-0368-1324994977610499)
Delmont TO, Eren AM. Linking pangenomes and metagenomes: the Prochlorococcus metapangenome. PeerJ. 2018;6:e4320. https://pubmed.ncbi.nlm.nih.gov/29423345 .
Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome’’. Proc Natl Acad Sci. 2005;102(39):13950–5. https://doi.org/10.1073/pnas.0506758102 . (PMID: 10.1073/pnas.0506758102161723791216834)
Duarte CM, Ngugi DK, Alam I, Pearman J, Kamau A, Eguiluz VM, et al. Sequencing effort dictates gene discovery in marine microbial metagenomes. Environ Microbiol. 2020;00:1–15.
Zaheer R, Noyes N, Polo RO, Cook SR, Marinier E, Van Domselaar G, et al. Impact of sequencing depth on the characterization of the microbiome and resistome. Sci Rep. 2018;8(1):1–11. https://doi.org/10.1038/s41598-018-24280-8 . (PMID: 10.1038/s41598-018-24280-8)
Royalty TM, Steen AD, Jansson JK. Theoretical and simulation-based investigation of the relationship between sequencing effort, microbial community richness, and diversity in binning metagenome-assembled genomes. mSystems. 2019;4(5):e00384-19. https://doi.org/10.1128/mSystems.00384-19 . (PMID: 10.1128/mSystems.00384-19315306486749106)
Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol. 2017;35(9):833–44. https://doi.org/10.1038/nbt.3935 . (PMID: 10.1038/nbt.393528898207)
Sharpton TJ. An introduction to the analysis of shotgun metagenomic data. Front Plant Sci. 2014;5. https://doi.org/10.3389/fpls.2014.00209 .
Chen LX, Anantharaman K, Shaiber A, Murat Eren A, Banfield JF. Accurate and complete genomes from metagenomes. Genome Res. 2020;30(3):315–33. (PMID: 10.1101/gr.258640.119321887017111523)
Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F, et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell. 2019;176(3):649-62.e20. https://doi.org/10.1016/j.cell.2019.01.001 . (PMID: 10.1016/j.cell.2019.01.001306617556349461)
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol). 1977;39(1):1–22.
de Leeuw J, Hornik K, Mair P. Isotone optimization in R: pool-adjacent-violators algorithm (PAVA) and active set methods. J Stat Softw. 2009;32(5):1–24. (PMID: 10.18637/jss.v032.i05)
Wang W, Yan J. splines2: regression spline functions and classes. 2021. R package version 0.4.5. https://CRAN.R-project.org/package=splines2 . Accessed 20 Apr 2023.
Shaiber A, Willis AD, Delmont TO, Roux S, Chen LX, Schmid AC, et al. Functional and genetic markers of niche partitioning among enigmatic members of the human oral microbiome. Genome Biol. 2020;21(1):292. https://doi.org/10.1186/s13059-020-02195-w . (PMID: 10.1186/s13059-020-02195-w333231227739484)
Richardson L, Allen B, Baldi G, Beracochea M, Bileschi M, Burdett T, et al. MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Res. 2022 12;51(D1):D753–D759. https://doi.org/10.1093/nar/gkac1080 .
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25(7):1043–55. (PMID: 10.1101/gr.186072.114259774774484387)
Arimizu Y, Kirino Y, Sato MP, Uno K, Sato T, Gotoh Y, et al. Large-scale genome analysis of bovine commensal Escherichia coli reveals that bovine-adapted E. Coli lineages are serving as evolutionary sources of the emergence of human intestinal pathogenic strains. Genome Res. 2019;29(9):1495–1505.
Fritz A, Hofmann P, Majda S, Dahms E, Dröge J, Fiedler J, et al. CAMISIM: simulating metagenomes and microbial communities. Microbiome. 2019;7(1):17. (PMID: 10.1186/s40168-019-0633-6307368496368784)
Brynildsrud O, Bohlin J, Scheffer L, Eldholm V. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol. 2016;17(1):1–9.
Earle SG, Wu CH, Charlesworth J, Stoesser N, Gordon NC, Walker TM, et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol. 2016;1(5):1–8. (PMID: 10.1038/nmicrobiol.2016.41)
Collins C, Didelot X. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLOS Comput Biol. 2018 02;14(2):1–21. https://doi.org/10.1371/journal.pcbi.1005958 .
Lees JA, Vehkala M, Välimäki N, Harris SR, Chewapreecha C, Croucher NJ, et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat Commun. 2016;7(1):12797. https://doi.org/10.1038/ncomms12797 . (PMID: 10.1038/ncomms12797276338315028413)
Sexton CE, Smith HZ, Newell PD, Douglas AE, Chaston JM. MAGNAMWAR: an R package for genome-wide association studies of bacterial orthologs. Bioinformatics. 2018 01;34(11):1951–1952. https://doi.org/10.1093/bioinformatics/bty001 .
San JE, Baichoo S, Kanzi A, Moosa Y, Lessells R, Fonseca V, et al. Current affairs of microbial genome-wide association studies: approaches, bottlenecks and analytical pitfalls. Front Microbiol. 2020;10. https://doi.org/10.3389/fmicb.2019.03119 .
Power RA, Parkhill J, de Oliveira T. Microbial genome-wide association studies: lessons from human GWAS. Nat Rev Genet. 2017;18(1):41–50. https://doi.org/10.1038/nrg.2016.132 . (PMID: 10.1038/nrg.2016.13227840430)
Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ. 2015;3:e1319. (PMID: 10.7717/peerj.1319265008264614810)
Blaustein RA, McFarland AG, Ben Maamar S, Lopez A, Castro-Wallace S, Hartmann EM. Pangenomic approach to understanding microbial adaptations within a model built environment, the international space station, relative to human hosts and soil. mSystems. 2019;4(1):1–16. (PMID: 10.1128/mSystems.00281-18)
Gweon HS, Shaw LP, Swann J, De Maio N, Abuoun M, Niehus R, et al. The impact of sequencing depth on the inferred taxonomic composition and AMR gene content of metagenomic samples. Environ Microbiomes. 2019;14(1):1–15.
Hillmann B, Al-Ghalith GA, Shields-Cutler RR, Zhu Q, Gohl DM, Beckman KB, et al. Evaluating the information content of shallow shotgun metagenomics. mSystems. 2018;3(6):e00069-18. https://doi.org/10.1128/mSystems.00069-18 . (PMID: 10.1128/mSystems.00069-18304436026234283)
Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014;15(2):121–32. https://doi.org/10.1038/nrg3642 . (PMID: 10.1038/nrg364224434847)
Larsson AJM, Stanley G, Sinha R, Weissman IL, Sandberg R. Computational correction of index switching in multiplexed sequencing libraries. Nat Methods. 2018;15(5):305–7. (PMID: 10.1038/nmeth.466629702636)
Illumina. Effects of index misassignment on multiplexing and downstream analysis. 2018. 770-2017-004-D. https://www.illumina.com/content/dam/illumina-marketing/documents/products/whitepapers/index-hopping-white-paper-770-2017-004.pdf . Accessed 1 Mar 2022.
Blanco-Míguez A, Beghini F, Cumbo F, McIver LJ, Thompson KN, Zolfo M, et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat Biotechnol. 2023.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4(1):41. https://doi.org/10.1186/1471-2105-4-41 . (PMID: 10.1186/1471-2105-4-4112969510222959)
Richardson L, Allen B, Baldi G, Beracochea M, Bileschi M, Burdett T, et al. EBI E, editor. MGnify Downloads. European Bioinformatics Institute. 2023. https://www.ebi.ac.uk/metagenomics/genomes/MGYG000004345#downloads . Accessed 01 Apr 2023.
Richardson L, Allen B, Baldi G, Beracochea M, Bileschi M, Burdett T, et al. EBI E, editor. MGnify Metadata. European Bioinformatics Institute. 2023. http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v2.0.1/genomes-all_metadata.tsv . Accessed 1 Apr 2023.
Wood SN. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc (B). 2011;73(1):3–36. (PMID: 10.1111/j.1467-9868.2010.00749.x)
Trinh P, Clausen DS, Willis AD. happi: a hierarchical approach to pangenomics inference. Zenodo. 2022. https://zenodo.org/record/8216120 . Accessed 4 Aug 2023.
Trinh P, Clausen DS, Willis AD. happi: a hierarchical approach to pangenomics inference. Datasets. GitHub. 2022.  https://github.com/statdivlab/happi_supplementary . Accessed 4 Aug 2023.
Trinh P, Clausen DS, Willis AD. happi: a hierarchical approach to pangenomics inference. Zenodo. 2022. https://zenodo.org/record/8197577 . Accessed 4 Aug 2023.
معلومات مُعتمدة: R21 AI168679 United States AI NIAID NIH HHS; R35 GM133420 United States GM NIGMS NIH HHS; T32 ES015459 United States ES NIEHS NIH HHS
فهرسة مساهمة: Keywords: Hypothesis testing; Metagenome-assembled genomes; Microbiome; Shotgun metagenomics; Statistical models
تواريخ الأحداث: Date Created: 20230929 Date Completed: 20231002 Latest Revision: 20240129
رمز التحديث: 20240129
مُعرف محوري في PubMed: PMC10540326
DOI: 10.1186/s13059-023-03040-6
PMID: 37773075
قاعدة البيانات: MEDLINE
الوصف
تدمد:1474-760X
DOI:10.1186/s13059-023-03040-6