دورية أكاديمية

iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures.

التفاصيل البيبلوغرافية
العنوان: iPRESTO: Automated discovery of biosynthetic sub-clusters linked to specific natural product substructures.
المؤلفون: Louwen JJR; Bioinformatics Group, Wageningen University, Wageningen, the Netherlands., Kautsar SA; Bioinformatics Group, Wageningen University, Wageningen, the Netherlands., van der Burg S; Netherlands eScience Center, Amsterdam, the Netherlands., Medema MH; Bioinformatics Group, Wageningen University, Wageningen, the Netherlands., van der Hooft JJJ; Bioinformatics Group, Wageningen University, Wageningen, the Netherlands.; Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa.
المصدر: PLoS computational biology [PLoS Comput Biol] 2023 Feb 09; Vol. 19 (2), pp. e1010462. Date of Electronic Publication: 2023 Feb 09 (Print Publication: 2023).
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: Public Library of Science Country of Publication: United States NLM ID: 101238922 Publication Model: eCollection Cited Medium: Internet ISSN: 1553-7358 (Electronic) Linking ISSN: 1553734X NLM ISO Abbreviation: PLoS Comput Biol Subsets: MEDLINE
أسماء مطبوعة: Original Publication: San Francisco, CA : Public Library of Science, [2005]-
مواضيع طبية MeSH: Biological Products*, Tandem Mass Spectrometry ; Metabolomics ; Bacteria/genetics ; Multigene Family
مستخلص: Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery.
Competing Interests: We have read the journal’s policy and the authors of this manuscript have the following competing interests: M.H.M. is on the scientific advisory board of Hexagon Bio and co-founder of Design Pharmaceuticals. JJJvdH is a member of the Scientific Advisory Board of NAICONS Srl., Milano, Italy. All other authors have declared that no competing interests exist.
(Copyright: © 2023 Louwen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
References: Chem Soc Rev. 2020 Jun 7;49(11):3297-3314. (PMID: 32393943)
Nucleic Acids Res. 2021 Jan 8;49(D1):D639-D643. (PMID: 33152079)
J Med Chem. 2008 Sep 25;51(18):5494-7. (PMID: 18800759)
Nat Commun. 2020 Nov 27;11(1):6058. (PMID: 33247171)
J Ind Microbiol Biotechnol. 2019 Mar;46(3-4):257-271. (PMID: 30269177)
Chembiochem. 2015 May 4;16(7):1115-9. (PMID: 25826784)
ACS Cent Sci. 2019 Nov 27;5(11):1824-1833. (PMID: 31807684)
Gigascience. 2021 Jan 13;10(1):. (PMID: 33438731)
mSystems. 2021 Aug 31;6(4):e0072621. (PMID: 34427506)
PLoS Comput Biol. 2021 May 4;17(5):e1008920. (PMID: 33945539)
Nucleic Acids Res. 2019 Jan 8;47(D1):D427-D432. (PMID: 30357350)
Appl Microbiol Biotechnol. 2008 Sep;80(4):637-45. (PMID: 18648803)
PLoS Comput Biol. 2014 Dec 04;10(12):e1004016. (PMID: 25474254)
Microbiome. 2023 Jan 23;11(1):13. (PMID: 36691088)
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W204-12. (PMID: 23737449)
Proc Natl Acad Sci U S A. 2008 Mar 25;105(12):4601-8. (PMID: 18216259)
Anal Chem. 2017 Jul 18;89(14):7569-7577. (PMID: 28621528)
J Nat Prod. 1991 May-Jun;54(3):774-84. (PMID: 1955880)
Nat Chem Biol. 2014 Nov;10(11):963-8. (PMID: 25262415)
Nucleic Acids Res. 2020 Jan 8;48(D1):D454-D458. (PMID: 31612915)
Commun Biol. 2019 Feb 28;2:83. (PMID: 30854475)
J Nat Prod. 2013 May 24;76(5):939-46. (PMID: 23607523)
PLoS One. 2011 Mar 31;6(3):e18031. (PMID: 21483852)
Nat Chem Biol. 2020 Jan;16(1):60-68. (PMID: 31768033)
Nucleic Acids Res. 2013 Jul;41(12):e121. (PMID: 23598997)
BMC Bioinformatics. 2017 Feb 13;18(1):107. (PMID: 28193156)
Metabolites. 2019 Jul 16;9(7):. (PMID: 31315242)
Faraday Discuss. 2019 Aug 15;218(0):284-302. (PMID: 31120050)
Bioorg Med Chem. 2009 Jun 15;17(12):4022-34. (PMID: 19216080)
Anticancer Agents Med Chem. 2015;15(3):277-84. (PMID: 25353334)
Chem Biol. 2006 Apr;13(4):387-97. (PMID: 16632251)
Biochim Biophys Acta Proteins Proteom. 2018 Jan;1866(1):60-67. (PMID: 28821467)
Microbiol Resour Announc. 2019 May 9;8(19):. (PMID: 31072893)
Proc Natl Acad Sci U S A. 2010 Nov 16;107(46):19731-5. (PMID: 21041678)
J Nat Prod. 2017 Mar 24;80(3):588-597. (PMID: 28335604)
Science. 2009 Jul 10;325(5937):161-5. (PMID: 19589993)
Cell. 2014 Jul 17;158(2):412-421. (PMID: 25036635)
Proc Natl Acad Sci U S A. 2016 Nov 29;113(48):13738-13743. (PMID: 27856765)
Nucleic Acids Res. 2021 Jul 2;49(W1):W29-W35. (PMID: 33978755)
Proc Natl Acad Sci U S A. 2017 May 30;114(22):5601-5606. (PMID: 28461474)
المشرفين على المادة: 0 (Biological Products)
تواريخ الأحداث: Date Created: 20230209 Date Completed: 20230224 Latest Revision: 20230228
رمز التحديث: 20230301
مُعرف محوري في PubMed: PMC9946207
DOI: 10.1371/journal.pcbi.1010462
PMID: 36758069
قاعدة البيانات: MEDLINE
الوصف
تدمد:1553-7358
DOI:10.1371/journal.pcbi.1010462