دورية أكاديمية

High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis.

التفاصيل البيبلوغرافية
العنوان: High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis.
المؤلفون: Tiwary S; Computational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, Martinsried, Germany., Levy R; Verily Life Sciences, South San Francisco, CA, USA., Gutenbrunner P; Computational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, Martinsried, Germany., Salinas Soto F; Computational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, Martinsried, Germany., Palaniappan KK; Verily Life Sciences, South San Francisco, CA, USA., Deming L; Google LLC, Mountain View, CA, USA., Berndl M; Google LLC, Mountain View, CA, USA., Brant A; Verily Life Sciences, South San Francisco, CA, USA., Cimermancic P; Verily Life Sciences, South San Francisco, CA, USA. cpeter@verily.com., Cox J; Computational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, Martinsried, Germany. cox@biochem.mpg.de.; Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway. cox@biochem.mpg.de.
المصدر: Nature methods [Nat Methods] 2019 Jun; Vol. 16 (6), pp. 519-525. Date of Electronic Publication: 2019 May 27.
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: Nature Pub. Group Country of Publication: United States NLM ID: 101215604 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1548-7105 (Electronic) Linking ISSN: 15487091 NLM ISO Abbreviation: Nat Methods Subsets: MEDLINE
أسماء مطبوعة: Original Publication: New York, NY : Nature Pub. Group, c2004-
مواضيع طبية MeSH: Data Analysis* , Peptide Library* , Software*, Biomarkers/*blood , Peptide Fragments/*analysis , Proteome/*analysis , Tandem Mass Spectrometry/*methods, Algorithms ; Amino Acid Sequence ; Databases, Protein ; HeLa Cells ; Humans ; Peptide Fragments/metabolism ; Proteome/metabolism
مستخلص: Peptide fragmentation spectra are routinely predicted in the interpretation of mass-spectrometry-based proteomics data. However, the generation of fragment ions has not been understood well enough for scientists to estimate fragment ion intensities accurately. Here, we demonstrate that machine learning can predict peptide fragmentation patterns in mass spectrometers with accuracy within the uncertainty of measurement. Moreover, analysis of our models reveals that peptide fragmentation depends on long-range interactions within a peptide sequence. We illustrate the utility of our models by applying them to the analysis of both data-dependent and data-independent acquisition datasets. In the former case, we observe a q-value-dependent increase in the total number of peptide identifications. In the latter case, we confirm that the use of predicted tandem mass spectrometry spectra is nearly equivalent to the use of spectra from experimental libraries.
التعليقات: Comment in: Nat Methods. 2019 Jun;16(6):469-470. (PMID: 31147636)
References: Cottrell, J. S. Protein identification using MS/MS data. J. Proteom. 74, 1842–1851 (2011). (PMID: 10.1016/j.jprot.2011.05.014)
Sinitcyn, P., Rudolph, J. D. & Cox, J. Computational methods for understanding mass spectrometry-based shotgun proteomics data. Annu. Rev. Biomed. Data Sci. 1, 207–234 (2018). (PMID: 10.1146/annurev-biodatasci-080917-013516)
Mitchell Wells, J. & McLuckey, S. A. Collision-induced dissociation (CID) of peptides and proteins. Methods Enzym. 402, 148–185 (2005). (PMID: 10.1016/S0076-6879(05)02005-7)
Olsen, J. V. et al. Higher-energy C-trap dissociation for peptide modification analysis. Nat. Methods 4, 709–712 (2007). (PMID: 10.1038/nmeth1060)
Coon, J. J., Syka, J., Shabanowitz, J. & Hunt, D. F. Tandem mass spectrometry for peptide and protein sequence analysis. Biotechniques 38, 519–521 (2005). (PMID: 10.2144/05384TE01)
Good, D. M., Wirtala, M., McAlister, G. C. & Coon, J. J. Performance characteristics of electron transfer dissociation mass spectrometry. Mol. Cell. Proteomics 6, 1942–1951 (2007). (PMID: 10.1074/mcp.M700073-MCP200)
Steen, H. & Mann, M. The ABC’s (and XYZ’s) of peptide sequencing. Nat. Rev. Mol. Cell Biol. 5, 699–711 (2004). (PMID: 10.1038/nrm1468)
Boyd, R. & Somogyi, Á. The mobile proton hypothesis in fragmentation of protonated peptides: a perspective. J. Am. Soc. Mass Spectrom. 21, 1275–1278 (2010). (PMID: 10.1016/j.jasms.2010.04.017)
Arnold, R. J., Jayasankar, N., Aggarwal, D., Tang, H. & Radivojac, P. A machine learning approach to predicting peptide fragmentation spectra. Pac. Symp. Biocomput. 230, 219–230 (2006).
Degroeve, S., Martens, L. & Jurisica, I. MS2PIP: a tool for MS/MS peak intensity prediction. Bioinformatics 29, 3199–3203 (2013). (PMID: 10.1093/bioinformatics/btt544)
Dong, N. P. et al. Prediction of peptide fragment ion mass spectra by data mining techniques. Anal. Chem. 86, 7446–7454 (2014). (PMID: 10.1021/ac501094m)
Park, J. et al. Informed-Proteomics: open-source software package for top-down proteomics. Nat. Methods 14, 909–914 (2017). (PMID: 10.1038/nmeth.4388)
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015). (PMID: 10.1038/nature14539)
Wolters, D. A., Washburn, M. P. & Yates, J. R. An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem. 73, 5683–5690 (2001). (PMID: 10.1021/ac010617e)
Doerr, A. DIA mass spectrometry. Nat. Methods 12, 35–35 (2014). (PMID: 10.1038/nmeth.3234)
Graves, A. et al. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31, 855–868 (2009). (PMID: 10.1109/TPAMI.2008.137)
Garnier, J., Gibrat, J.-F. & Robson, B. GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. 266, 540–553 (1996). (PMID: 10.1016/S0076-6879(96)66034-0)
Rost, B., Sander, C. & Schneider, R. PHD—an automatic mail server for protein secondary structure prediction. Bioinformatics 10, 53–60 (1994). (PMID: 10.1093/bioinformatics/10.1.53)
Vapnik, V. N. The Nature of Statistical Learning Theory (Springer, 1995).
Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A. & Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 9, 155–161 (1997).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001). (PMID: 10.1023/A:1010933404324)
Shao, C., Zhang, Y. & Sun, W. Statistical characterization of HCD fragmentation patterns of tryptic peptides on an LTQ Orbitrap Velos mass spectrometer. J. Proteomics 109, 26–37 (2014). (PMID: 10.1016/j.jprot.2014.06.012)
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. Preprint at https://arxiv.org/abs/1703.01365 (2017).
Schubert, O. T. et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat. Protoc. 10, 426–441 (2015). (PMID: 10.1038/nprot.2015.015)
Wu, J. X. et al. SWATH mass spectrometry performance using extended peptide MS/MS assay libraries. Mol. Cell. Proteomics 15, 2501–2514 (2016). (PMID: 10.1074/mcp.M115.055558)
Tsou, C.-C. et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264 (2015). (PMID: 10.1038/nmeth.3255)
Bruderer, R. et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteomics 16, 2296–2309 (2017). (PMID: 10.1074/mcp.RA117.000314)
Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011). (PMID: 10.1021/pr101065j)
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008). (PMID: 10.1038/nbt.1511)
Tyanova, S., Temu, T. & Cox, J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protoc. 11, 2301–2319 (2016). (PMID: 10.1038/nprot.2016.136)
Vizcaíno, J. A. et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 44, D447–D456 (2016). (PMID: 10.1093/nar/gkv1145)
Nanjappa, V. et al. Plasma proteome database as a resource for proteomics research: 2014 update. Nucleic Acids Res. 42, D959–D965 (2014). (PMID: 10.1093/nar/gkt1251)
Mallick, P. et al. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25, 125–131 (2007). (PMID: 10.1038/nbt1275)
Sanders, W. S., Bridges, S. M., McCarthy, F. M., Nanduri, B. & Burgess, S. C. Prediction of peptides observable by mass spectrometry applied at the experimental set level. BMC Bioinformatics 8, S23 (2007). (PMID: 10.1186/1471-2105-8-S7-S23)
Zolg, D. P. et al. Building proteometools based on a complete synthetic human proteome. Nat. Methods 14, 259–262 (2017). (PMID: 10.1038/nmeth.4153)
Hochreiter, S. & Schmidhuber, J. J. Long short-term memory. Neural Comput. 9, 1–32 (1997). (PMID: 10.1162/neco.1997.9.1.1)
Hahnioser, R. H. R., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J. & Seung, H. S. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947–951 (2000). (PMID: 10.1038/35016072)
Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In Proc. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16) 265–284 (USENIX Association, 2016).
Golovin, D. et al. Google Vizier: a service for black-box optimization. In Proc. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1487–1495 (ACM, 2017).
Kingma, D. P. & Ba, J. L. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2015).
Hunt, D. F., Yates, J. R., Shabanowitz, J., Winston, S. & Hauer, C. R. Protein sequencing by tandem mass spectrometry. Proc. Natl Acad. Sci. USA 83, 6233–6237 (1986). (PMID: 10.1073/pnas.83.17.6233)
Kelstrup, C. D. et al. Performance evaluation of the q exactive hf-x for shotgun proteomics. J. Proteome Res. 17, 727–738 (2018). (PMID: 10.1021/acs.jproteome.7b00602)
Krokhin, O. V. Sequence-specific retention calculator. ALGORITHM for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-Å pore size C18 sorbents. Anal. Chem. 78, 7785–7795 (2006). (PMID: 10.1021/ac060777w)
المشرفين على المادة: 0 (Biomarkers)
0 (Peptide Fragments)
0 (Peptide Library)
0 (Proteome)
تواريخ الأحداث: Date Created: 20190529 Date Completed: 20190708 Latest Revision: 20210331
رمز التحديث: 20240628
DOI: 10.1038/s41592-019-0427-6
PMID: 31133761
قاعدة البيانات: MEDLINE