دورية أكاديمية

High performance Legionella pneumophila source attribution using genomics-based machine learning classification.

التفاصيل البيبلوغرافية
العنوان: High performance Legionella pneumophila source attribution using genomics-based machine learning classification.
المؤلفون: Buultjens AH; Department of Microbiology and Immunology, Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia.; Center for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia., Vandelannoote K; Bacterial Phylogenomics Group, Institut Pasteur du Cambodge, Phnom Penh, Cambodia., Mercoulia K; Department of Microbiology and Immunology, Microbiology Diagnostic Unit, Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia., Ballard S; Department of Microbiology and Immunology, Microbiology Diagnostic Unit, Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia., Sloggett C; Department of Microbiology and Immunology, Microbiology Diagnostic Unit, Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia., Howden BP; Center for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia.; Department of Microbiology and Immunology, Microbiology Diagnostic Unit, Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia.; Department of Infectious Diseases, Austin Health, Heidelberg, Victoria, Australia., Seemann T; Department of Microbiology and Immunology, Microbiology Diagnostic Unit, Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia., Stinear TP; Department of Microbiology and Immunology, Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia.; Center for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia.
المصدر: Applied and environmental microbiology [Appl Environ Microbiol] 2024 Mar 20; Vol. 90 (3), pp. e0129223. Date of Electronic Publication: 2024 Jan 30.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: American Society for Microbiology Country of Publication: United States NLM ID: 7605801 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1098-5336 (Electronic) Linking ISSN: 00992240 NLM ISO Abbreviation: Appl Environ Microbiol Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Washington, American Society for Microbiology.
مواضيع طبية MeSH: Legionella pneumophila*/genetics , Legionnaires' Disease*/epidemiology, Humans ; Multilocus Sequence Typing/methods ; Genomics/methods ; Molecular Epidemiology/methods ; Disease Outbreaks
مستخلص: Fundamental to effective Legionnaires' disease outbreak control is the ability to rapidly identify the environmental source(s) of the causative agent, Legionella pneumophila . Genomics has revolutionized pathogen surveillance, but L. pneumophila has a complex ecology and population structure that can limit source inference based on standard core genome phylogenetics. Here, we present a powerful machine learning approach that assigns the geographical source of Legionnaires' disease outbreaks more accurately than current core genome comparisons. Models were developed upon 534 L . pneumophila genome sequences, including 149 genomes linked to 20 previously reported Legionnaires' disease outbreaks through detailed case investigations. Our classification models were developed in a cross-validation framework using only environmental L. pneumophila genomes. Assignments of clinical isolate geographic origins demonstrated high predictive sensitivity and specificity of the models, with no false positives or false negatives for 13 out of 20 outbreak groups, despite the presence of within-outbreak polyclonal population structure. Analysis of the same 534-genome panel with a conventional phylogenomic tree and a core genome multi-locus sequence type allelic distance-based classification approach revealed that our machine learning method had the highest overall classification performance-agreement with epidemiological information. Our multivariate statistical learning approach maximizes the use of genomic variation data and is thus well-suited for supporting Legionnaires' disease outbreak investigations.IMPORTANCEIdentifying the sources of Legionnaires' disease outbreaks is crucial for effective control. Current genomic methods, while useful, often fall short due to the complex ecology and population structure of Legionella pneumophila , the causative agent. Our study introduces a high-performing machine learning approach for more accurate geographical source attribution of Legionnaires' disease outbreaks. Developed using cross-validation on environmental L. pneumophila genomes, our models demonstrate excellent predictive sensitivity and specificity. Importantly, this new approach outperforms traditional methods like phylogenomic trees and core genome multi-locus sequence typing, proving more efficient at leveraging genomic variation data to infer outbreak sources. Our machine learning algorithms, harnessing both core and accessory genomic variation, offer significant promise in public health settings. By enabling rapid and precise source identification in Legionnaires' disease outbreaks, such approaches have the potential to expedite intervention efforts and curtail disease transmission.
Competing Interests: The authors declare no conflict of interest.
References: J Clin Microbiol. 2016 Feb;54(2):333-42. (PMID: 26607978)
Front Microbiol. 2016 Jan 21;6:1556. (PMID: 26834713)
Euro Surveill. 2019 Jan;24(4):. (PMID: 30696527)
Epidemiol Infect. 2014 Nov;142(11):2347-51. (PMID: 24576553)
Bioinformatics. 2014 Aug 1;30(15):2114-20. (PMID: 24695404)
Appl Environ Microbiol. 2008 May;74(10):3030-7. (PMID: 18390683)
Euro Surveill. 2015 Jul 16;20(28):. (PMID: 26212142)
BMJ. 2015 May 11;350:h1314. (PMID: 25964672)
Microorganisms. 2022 Feb 28;10(3):. (PMID: 35336109)
Life Sci Alliance. 2022 Mar 2;5(6):. (PMID: 35236759)
J Comput Biol. 2012 May;19(5):455-77. (PMID: 22506599)
Genome Biol. 2014;15(11):504. (PMID: 25370747)
Clin Infect Dis. 2003 Jan 1;36(1):64-9. (PMID: 12491204)
Appl Environ Microbiol. 2017 Oct 17;83(21):. (PMID: 28821546)
Mol Biol Evol. 2009 Jul;26(7):1641-50. (PMID: 19377059)
Eur J Clin Microbiol Infect Dis. 2008 Jan;27(1):29-36. (PMID: 17909867)
Genome Res. 2016 Nov;26(11):1555-1564. (PMID: 27662900)
Appl Environ Microbiol. 2016 May 31;82(12):3582-3590. (PMID: 27060122)
Int J Environ Res Public Health. 2022 Jan 20;19(3):. (PMID: 35162143)
mBio. 2015 Dec 08;6(6):e01888-15. (PMID: 26646014)
Trends Microbiol. 2021 Sep;29(9):788-797. (PMID: 33736902)
Pathog Dis. 2017 Jun 1;75(4):. (PMID: 28387837)
Sci Rep. 2016 Feb 18;6:21356. (PMID: 26888563)
BMC Infect Dis. 2014 Nov 12;14:591. (PMID: 25388670)
Clin Microbiol Rev. 2015 Jan;28(1):95-133. (PMID: 25567224)
BMJ Open. 2013 Jan 09;3(1):. (PMID: 23306006)
Clin Infect Dis. 2017 May 1;64(9):1251-1259. (PMID: 28203790)
Lancet Microbe. 2022 Nov;3(11):e835-e845. (PMID: 36240833)
PLoS Comput Biol. 2015 Feb 12;11(2):e1004041. (PMID: 25675341)
J Infect Dis. 2002 Jul 1;186(1):127-8. (PMID: 12089674)
Appl Environ Microbiol. 2021 Jul 27;87(16):e0058021. (PMID: 34085864)
Clin Infect Dis. 2016 Feb 1;62(3):273-279. (PMID: 26462745)
Lancet Microbe. 2021 Nov;2(11):e575-e583. (PMID: 35544081)
Clin Microbiol Rev. 2002 Jul;15(3):506-26. (PMID: 12097254)
Bioinformatics. 2004 Jan 22;20(2):289-90. (PMID: 14734327)
Eur J Clin Microbiol Infect Dis. 2009 Jul;28(7):781-91. (PMID: 19156453)
Euro Surveill. 2017 Nov;22(45):. (PMID: 29162202)
معلومات مُعتمدة: GNT1149991 DHAC | National Health and Medical Research Council (NHMRC); GNT1194325 DHAC | National Health and Medical Research Council (NHMRC)
فهرسة مساهمة: Keywords: Legionella pneumophila; Legionnaires' disease; bacterial genomics; machine learning; outbreak control; public health; source attribution
تواريخ الأحداث: Date Created: 20240130 Date Completed: 20240321 Latest Revision: 20240322
رمز التحديث: 20240322
مُعرف محوري في PubMed: PMC10952463
DOI: 10.1128/aem.01292-23
PMID: 38289130
قاعدة البيانات: MEDLINE
الوصف
تدمد:1098-5336
DOI:10.1128/aem.01292-23