دورية أكاديمية

PowerNovo: de novo peptide sequencing via tandem mass spectrometry using an ensemble of transformer and BERT models.

التفاصيل البيبلوغرافية
العنوان: PowerNovo: de novo peptide sequencing via tandem mass spectrometry using an ensemble of transformer and BERT models.
المؤلفون: Petrovskiy DV; Institute of Biomedical Chemistry, 119121, Moscow, Russia., Nikolsky KS; Institute of Biomedical Chemistry, 119121, Moscow, Russia., Kulikova LI; Institute of Biomedical Chemistry, 119121, Moscow, Russia., Rudnev VR; Institute of Biomedical Chemistry, 119121, Moscow, Russia., Butkova TV; Institute of Biomedical Chemistry, 119121, Moscow, Russia., Malsagova KA; Institute of Biomedical Chemistry, 119121, Moscow, Russia., Kopylov AT; Institute of Biomedical Chemistry, 119121, Moscow, Russia., Kaysheva AL; Institute of Biomedical Chemistry, 119121, Moscow, Russia. kaysheva1@gmail.com.
المصدر: Scientific reports [Sci Rep] 2024 Jul 01; Vol. 14 (1), pp. 15000. Date of Electronic Publication: 2024 Jul 01.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Nature Publishing Group Country of Publication: England NLM ID: 101563288 Publication Model: Electronic Cited Medium: Internet ISSN: 2045-2322 (Electronic) Linking ISSN: 20452322 NLM ISO Abbreviation: Sci Rep Subsets: MEDLINE
أسماء مطبوعة: Original Publication: London : Nature Publishing Group, copyright 2011-
مواضيع طبية MeSH: Tandem Mass Spectrometry*/methods , Sequence Analysis, Protein*/methods , Peptides*/chemistry , Peptides*/analysis, Amino Acid Sequence ; Software ; Proteomics/methods ; Algorithms
مستخلص: The primary objective of analyzing the data obtained in a mass spectrometry-based proteomic experiment is peptide and protein identification, or correct assignment of the tandem mass spectrum to one amino acid sequence. Comparison of empirical fragment spectra with the theoretical predicted one or matching with the collected spectra library are commonly accepted strategies of proteins identification and defining of their amino acid sequences. Although these approaches are widely used and are appreciably efficient for the well-characterized model organisms or measured proteins, they cannot detect novel peptide sequences that have not been previously annotated or are rare. This study presents PowerNovo tool for de novo sequencing of proteins using tandem mass spectra acquired in a variety of types of mass analyzers and different fragmentation techniques. PowerNovo involves an ensemble of models for peptide sequencing: model for detecting regularities in tandem mass spectra, precursors, and fragment ions and a natural language processing model, which has a function of peptide sequence quality assessment and helps with reconstruction of noisy sequences. The results of testing showed that the performance of PowerNovo is comparable and even better than widely utilized PointNovo, DeepNovo, Casanovo, and Novor packages. Also, PowerNovo provides complete cycle of processing (pipeline) of mass spectrometry data and, along with predicting the peptide sequence, involves the peptide assembly and protein inference blocks.
(© 2024. The Author(s).)
References: Genomics Proteomics Bioinformatics. 2023 Oct;21(5):1054-1058. (PMID: 36572336)
Proc IEEE Int Symp Bioinformatics Bioeng. 2023 Dec;2023:28-35. (PMID: 38665266)
Brief Bioinform. 2023 Jan 19;24(1):. (PMID: 36545804)
Front Plant Sci. 2018 Nov 13;9:1559. (PMID: 30483279)
J Am Soc Mass Spectrom. 2015 Nov;26(11):1885-94. (PMID: 26122521)
Mol Cell Proteomics. 2019 Dec;18(12):2478-2491. (PMID: 31591261)
Anal Chem. 2023 Apr 18;95(15):6235-6243. (PMID: 36908083)
BMC Bioinformatics. 2009 Dec 15;10:421. (PMID: 20003500)
Nat Commun. 2021 May 14;12(1):2795. (PMID: 33990604)
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii95-ii98. (PMID: 36124789)
Comput Struct Biotechnol J. 2022 Mar 19;20:1402-1412. (PMID: 35386104)
Proc Natl Acad Sci U S A. 2017 Aug 1;114(31):8247-8252. (PMID: 28720701)
Sci Rep. 2016 Aug 26;6:31730. (PMID: 27562653)
F1000Res. 2017 Feb 17;6:161. (PMID: 28357047)
Cell Syst. 2018 Oct 24;7(4):412-421.e5. (PMID: 30172843)
Nat Commun. 2021 Jun 7;12(1):3346. (PMID: 34099720)
J Proteome Res. 2021 Apr 2;20(4):1951-1965. (PMID: 33729787)
Anal Chem. 2021 Apr 27;93(16):6481-6490. (PMID: 33843206)
PeerJ. 2016 Oct 18;4:e2584. (PMID: 27781170)
Anal Chem Insights. 2018 Feb 08;13:1177390118757462. (PMID: 29467569)
Bioinformatics. 2020 Feb 15;36(4):1279-1280. (PMID: 31529040)
Nat Commun. 2023 Dec 2;14(1):7974. (PMID: 38042873)
J Proteome Res. 2017 Dec 1;16(12):4374-4390. (PMID: 28960077)
Anal Chem. 2019 Jul 2;91(13):8705-8711. (PMID: 31247716)
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D655-8. (PMID: 16381952)
PLoS One. 2023 Nov 30;18(11):e0289693. (PMID: 38032878)
معلومات مُعتمدة: 122092200056-9 Russian Federation Fundamental Research Program for the long-term period for 2021-2030
المشرفين على المادة: 0 (Peptides)
تواريخ الأحداث: Date Created: 20240701 Date Completed: 20240702 Latest Revision: 20240704
رمز التحديث: 20240704
مُعرف محوري في PubMed: PMC11217302
DOI: 10.1038/s41598-024-65861-0
PMID: 38951578
قاعدة البيانات: MEDLINE
الوصف
تدمد:2045-2322
DOI:10.1038/s41598-024-65861-0