دورية أكاديمية

Knot or not? Identifying unknotted proteins in knotted families with sequence-based Machine Learning model.

التفاصيل البيبلوغرافية
العنوان: Knot or not? Identifying unknotted proteins in knotted families with sequence-based Machine Learning model.
المؤلفون: Sikora M; Centre of New Technologies, University of Warsaw, Warsaw, Poland.; Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland., Klimentova E; Central European Institute of Technology, Masaryk University, Brno, Czech Republic.; National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic., Uchal D; Centre of New Technologies, University of Warsaw, Warsaw, Poland.; Faculty of Physics, University of Warsaw, Warsaw, Poland., Sramkova D; Central European Institute of Technology, Masaryk University, Brno, Czech Republic.; National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic., Perlinska AP; Centre of New Technologies, University of Warsaw, Warsaw, Poland., Nguyen ML; Centre of New Technologies, University of Warsaw, Warsaw, Poland., Korpacz M; Centre of New Technologies, University of Warsaw, Warsaw, Poland.; Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland., Malinowska R; Centre of New Technologies, University of Warsaw, Warsaw, Poland.; Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland., Nowakowski S; Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland.; Faculty of Physics, University of Warsaw, Warsaw, Poland., Rubach P; Centre of New Technologies, University of Warsaw, Warsaw, Poland.; Warsaw School of Economics, Warsaw, Poland., Simecek P; Central European Institute of Technology, Masaryk University, Brno, Czech Republic., Sulkowska JI; Centre of New Technologies, University of Warsaw, Warsaw, Poland.
المصدر: Protein science : a publication of the Protein Society [Protein Sci] 2024 Jul; Vol. 33 (7), pp. e4998.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Cold Spring Harbor Laboratory Press Country of Publication: United States NLM ID: 9211750 Publication Model: Print Cited Medium: Internet ISSN: 1469-896X (Electronic) Linking ISSN: 09618368 NLM ISO Abbreviation: Protein Sci Subsets: MEDLINE
أسماء مطبوعة: Publication: 2001- : Woodbury, NY : Cold Spring Harbor Laboratory Press
Original Publication: New York, N.Y. : Cambridge University Press, c1992-
مواضيع طبية MeSH: Machine Learning* , Proteins*/chemistry , Proteins*/genetics , Databases, Protein* , Models, Molecular*, Protein Conformation ; Amino Acid Sequence
مستخلص: Knotted proteins, although scarce, are crucial structural components of certain protein families, and their roles continue to be a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold (AF), this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning (ML) model capable of accurately predicting the presence of knots in protein structures solely from their amino acid sequences. We tested the model's capabilities on 100 proteins whose structures had not yet been predicted by AF and found agreement with our local prediction in 92% cases. From the point of view of structural biology, we found that all potentially knotted proteins predicted by AF can be classified only into 17 families. This allows us to discover the presence of unknotted proteins in families with a highly conserved knot. We found only three new protein families: UCH, DUF4253, and DUF2254, that contain both knotted and unknotted proteins, and demonstrate that deletions within the knot core could potentially account for the observed unknotted (trivial) topology. Finally, we have shown that in the majority of knotted families (11 out of 15), the knotted topology is strictly conserved in functional proteins with very low sequence similarity. We have conclusively demonstrated that proteins AF predicts as unknotted are structurally accurate in their unknotted configurations. However, these proteins often represent nonfunctional fragments, lacking significant portions of the knot core (amino acid sequence).
(© 2024 The Author(s). Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.)
References: J Phys Chem B. 2021 Jul 15;125(27):7335-7350. (PMID: 34110163)
J Mol Biol. 2007 Oct 12;373(1):153-66. (PMID: 17764691)
Nature. 1977 Aug 11;268(5620):495-500. (PMID: 329147)
Curr Opin Struct Biol. 2023 Dec;83:102709. (PMID: 37778185)
Comput Struct Biotechnol J. 2015 Aug 19;13:459-68. (PMID: 26380658)
J Mol Biol. 2007 May 4;368(3):884-93. (PMID: 17368671)
J Mol Biol. 2019 Jan 18;431(2):244-257. (PMID: 30391297)
PLoS Comput Biol. 2013;9(3):e1003002. (PMID: 23555232)
Sci Rep. 2019 Jun 10;9(1):8426. (PMID: 31182755)
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. (PMID: 34232869)
Curr Opin Struct Biol. 2017 Feb;42:6-14. (PMID: 27794211)
J Biol Chem. 2022 Oct;298(10):102393. (PMID: 35988649)
Nat Commun. 2017 Nov 17;8(1):1581. (PMID: 29146980)
Bioinformatics. 2012 Dec 1;28(23):3150-2. (PMID: 23060610)
Proc Natl Acad Sci U S A. 2009 Mar 3;106(9):3119-24. (PMID: 19211785)
PLoS Comput Biol. 2010 Jul 29;6(7):e1000864. (PMID: 20686683)
PLoS Comput Biol. 2006 Sep 15;2(9):e122. (PMID: 16978047)
Nat Biotechnol. 2017 Nov;35(11):1026-1028. (PMID: 29035372)
Curr Opin Struct Biol. 2020 Feb;60:131-141. (PMID: 32062143)
Comput Struct Biotechnol J. 2022 Jul 18;20:3874-3883. (PMID: 35891782)
J Phys Chem B. 2015 Mar 26;119(12):4359-70. (PMID: 25741995)
PLoS Comput Biol. 2021 Oct 14;17(10):e1009502. (PMID: 34648493)
Commun Biol. 2023 Oct 28;6(1):1098. (PMID: 37898666)
Nat Struct Biol. 1994 Apr;1(4):213-4. (PMID: 7656045)
Protein Sci. 2022 Aug;31(8):e4380. (PMID: 35900026)
Nucleic Acids Res. 2022 Jul 5;50(W1):W44-W50. (PMID: 35609987)
Nature. 2021 Aug;596(7873):583-589. (PMID: 34265844)
Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. (PMID: 34791371)
PLoS One. 2016 Nov 2;11(11):e0165986. (PMID: 27806097)
J Mol Model. 2022 Mar 31;28(4):108. (PMID: 35357594)
BMC Bioinformatics. 2007 Mar 05;8:73. (PMID: 17338813)
Proc Natl Acad Sci U S A. 2019 May 7;116(19):9360-9369. (PMID: 31000594)
Protein Sci. 2023 May;32(5):e4631. (PMID: 36960558)
PLoS Comput Biol. 2020 May 26;16(5):e1007904. (PMID: 32453784)
Mol Biosyst. 2016 Aug 16;12(9):2700-12. (PMID: 27425826)
Brief Bioinform. 2021 May 20;22(3):. (PMID: 32935829)
Nucleic Acids Res. 2023 Jan 6;51(D1):D418-D427. (PMID: 36350672)
Sci Rep. 2020 Jun 12;10(1):9562. (PMID: 32533020)
J Biol Chem. 2024 Jan;300(1):105553. (PMID: 38072060)
Proc Natl Acad Sci U S A. 2008 Dec 16;105(50):19714-9. (PMID: 19064918)
PLoS Comput Biol. 2012;8(6):e1002504. (PMID: 22719235)
Chem Sci. 2020 Oct 19;11(46):12512-12521. (PMID: 34123232)
FEBS J. 2009 May;276(9):2625-35. (PMID: 19476499)
Proc Natl Acad Sci U S A. 2010 Nov 30;107(48):20732-7. (PMID: 21068371)
Nucleic Acids Res. 2019 Jan 8;47(D1):D367-D375. (PMID: 30508159)
Nat Commun. 2023 Oct 24;14(1):6746. (PMID: 37875492)
Nat Commun. 2018 Jun 29;9(1):2542. (PMID: 29959318)
J Mol Biol. 2015 Jan 30;427(2):248-58. (PMID: 25234087)
ACS Catal. 2020 Aug 7;10(15):8058-8068. (PMID: 32904895)
Int J Mol Sci. 2015 Aug 12;16(8):18836-64. (PMID: 26274952)
Bioinformatics. 2018 Oct 1;34(19):3300-3307. (PMID: 29718096)
Proc Natl Acad Sci U S A. 2012 Jun 26;109(26):E1715-23. (PMID: 22685208)
Nat Struct Mol Biol. 2016 Oct;23(10):941-948. (PMID: 27571175)
معلومات مُعتمدة: UMO-2018/31/B/NZ1/04016 Narodowe Centrum Nauki; 2021/43/I/NZ1/03341 Narodowe Centrum Nauki; 23-04260L Grantová Agentura České Republiky; e-INFRA CZ LM2018140 Ministry of Education, Youth and Sport of the Czech Republic; 90254 Ministry of Education, Youth and Sport of the Czech Republic
فهرسة مساهمة: Keywords: AlphaFold; SPOUT family proteins; deep learning; knotted proteins; protein topology
المشرفين على المادة: 0 (Proteins)
تواريخ الأحداث: Date Created: 20240618 Date Completed: 20240618 Latest Revision: 20240620
رمز التحديث: 20240620
مُعرف محوري في PubMed: PMC11184937
DOI: 10.1002/pro.4998
PMID: 38888487
قاعدة البيانات: MEDLINE
الوصف
تدمد:1469-896X
DOI:10.1002/pro.4998