دورية أكاديمية

Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations.

التفاصيل البيبلوغرافية
العنوان: Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations.
المؤلفون: Diaz DJ; UT Austin, Department of Computer Science, Austin, TX, 78712, USA. dannyjdiaz305@gmail.com.; Intelligent Proteins, LLC, Austin, TX, 78712, USA. dannyjdiaz305@gmail.com.; UT Austin, Department of Chemistry, Austin, TX, 78712, USA. dannyjdiaz305@gmail.com., Gong C; UT Austin, Department of Computer Science, Austin, TX, 78712, USA., Ouyang-Zhang J; UT Austin, Department of Computer Science, Austin, TX, 78712, USA., Loy JM; Intelligent Proteins, LLC, Austin, TX, 78712, USA.; UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA., Wells J; UT Austin, McKetta Department of Chemical Engineering, Austin, TX, 78712, USA., Yang D; UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA., Ellington AD; UT Austin, Department of Molecular Biosciences, Austin, TX, 78712, USA., Dimakis AG; UT Austin, Chandra Family Department of Electrical and Computer Engineering, Austin, TX, 78712, USA., Klivans AR; UT Austin, Department of Computer Science, Austin, TX, 78712, USA.
المصدر: Nature communications [Nat Commun] 2024 Jul 23; Vol. 15 (1), pp. 6170. Date of Electronic Publication: 2024 Jul 23.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Nature Pub. Group Country of Publication: England NLM ID: 101528555 Publication Model: Electronic Cited Medium: Internet ISSN: 2041-1723 (Electronic) Linking ISSN: 20411723 NLM ISO Abbreviation: Nat Commun Subsets: MEDLINE
أسماء مطبوعة: Original Publication: [London] : Nature Pub. Group
مواضيع طبية MeSH: Mutation* , Protein Stability* , Proteins*/genetics , Proteins*/chemistry , Thermodynamics*, Protein Engineering/methods ; Models, Molecular ; Algorithms ; Neural Networks, Computer ; Protein Conformation ; Computational Biology/methods
مستخلص: Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.
(© 2024. The Author(s).)
References: BMC Bioinformatics. 2019 Jul 3;20(Suppl 14):335. (PMID: 31266447)
Bioinformatics. 2011 Dec 1;27(23):3286-92. (PMID: 21998155)
Cell Syst. 2023 Nov 15;14(11):968-978.e3. (PMID: 37909046)
ACS Synth Biol. 2020 Nov 20;9(11):2927-2935. (PMID: 33064458)
Biochemistry. 2023 Jan 17;62(2):410-418. (PMID: 34762799)
Nat Rev Chem. 2022 Jun;6(6):428-442. (PMID: 37117429)
Structure. 2020 Jun 2;28(6):717-726.e3. (PMID: 32375024)
Proc Natl Acad Sci U S A. 1998 Oct 27;95(22):12809-13. (PMID: 9788996)
Nature. 2023 Aug;620(7973):434-444. (PMID: 37468638)
Nucleic Acids Res. 2019 Jan 8;47(D1):D520-D528. (PMID: 30357364)
Science. 2023 Mar 17;379(6637):1123-1130. (PMID: 36927031)
Curr Opin Struct Biol. 2023 Feb;78:102518. (PMID: 36603229)
Proc Natl Acad Sci U S A. 1993 Jun 15;90(12):5618-22. (PMID: 8516309)
Nat Commun. 2024 Mar 7;15(1):2084. (PMID: 38453941)
Curr Opin Struct Biol. 2020 Feb;60:157-166. (PMID: 32087409)
Elife. 2023 May 15;12:. (PMID: 37184062)
Nat Biotechnol. 2017 Nov;35(11):1026-1028. (PMID: 29035372)
Proc Natl Acad Sci U S A. 2019 Aug 13;116(33):16367-16377. (PMID: 31371509)
Biotechnol Appl Biochem. 2020 Jul;67(4):586-601. (PMID: 32248597)
Circ Res. 2013 Sep 13;113(7):933-43. (PMID: 24030023)
Bioinformatics. 2018 Nov 1;34(21):3659-3665. (PMID: 29718106)
Science. 2021 Aug 20;373(6557):871-876. (PMID: 34282049)
Nature. 2021 Nov;599(7883):91-95. (PMID: 34707284)
Annu Rev Pharmacol Toxicol. 2020 Jan 6;60:391-415. (PMID: 31914898)
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W382-8. (PMID: 15980494)
Nucleic Acids Res. 2020 Jul 2;48(W1):W591-W596. (PMID: 32402071)
Nat Methods. 2022 Jun;19(6):679-682. (PMID: 35637307)
Protein Sci. 2021 Jan;30(1):60-69. (PMID: 32881105)
Bioinformatics. 2023 Nov 1;39(11):. (PMID: 37935419)
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2:S6. (PMID: 18387208)
Proteins. 2004 Dec 1;57(4):702-10. (PMID: 15476259)
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W215-22. (PMID: 21593128)
PLoS Comput Biol. 2020 Nov 30;16(11):e1008291. (PMID: 33253214)
Protein Eng. 1999 Feb;12(2):85-94. (PMID: 10195279)
PLoS One. 2012;7(10):e46084. (PMID: 23144695)
J Am Chem Soc. 2024 Mar 20;146(11):7191-7197. (PMID: 38442365)
Sci STKE. 2004 Feb 03;2004(219):pl2. (PMID: 14872095)
Nature. 2022 Apr;604(7907):662-667. (PMID: 35478237)
Nat Methods. 2018 Oct;15(10):816-822. (PMID: 30250057)
F1000Res. 2016 Feb 18;5:189. (PMID: 26973785)
Nat Methods. 2024 May 14;:. (PMID: 38744917)
J Mol Biol. 2002 Jul 5;320(2):369-87. (PMID: 12079393)
Curr Opin Struct Biol. 2022 Feb;72:161-168. (PMID: 34922207)
Nature. 2021 Aug;596(7873):583-589. (PMID: 34265844)
J Chem Inf Model. 2019 Apr 22;59(4):1508-1514. (PMID: 30759982)
Brief Bioinform. 2022 Mar 10;23(2):. (PMID: 35021190)
Science. 2022 Oct 7;378(6615):49-56. (PMID: 36108050)
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D120-1. (PMID: 14681373)
J Mol Biol. 2022 Jan 30;434(2):167375. (PMID: 34826524)
J Chem Inf Model. 2020 Oct 26;60(10):4772-4784. (PMID: 32786698)
Annu Rev Biomed Eng. 2013;15:93-113. (PMID: 23642248)
Front Mol Biosci. 2023 Jan 05;9:1075570. (PMID: 36685278)
Trends Biochem Sci. 2023 Apr;48(4):345-359. (PMID: 36504138)
BMC Bioinformatics. 2011 May 13;12:151. (PMID: 21569468)
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):. (PMID: 33876751)
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W314-9. (PMID: 24829462)
Nat Biotechnol. 2024 Feb;42(2):275-283. (PMID: 37095349)
Nat Rev Chem. 2022 Sep;6(9):653-672. (PMID: 37117713)
BMC Bioinformatics. 2021 Feb 25;22(1):88. (PMID: 33632133)
Nucleic Acids Res. 2021 Jan 8;49(D1):D319-D324. (PMID: 33166383)
BMC Bioinformatics. 2015 Apr 16;16:116. (PMID: 25885774)
Comput Biol Chem. 2023 Dec;107:107952. (PMID: 37643501)
Angew Chem Int Ed Engl. 2021 Jan 4;60(1):88-119. (PMID: 32558088)
Comput Struct Biotechnol J. 2020 Jul 24;18:1968-1979. (PMID: 32774791)
Proteins. 2006 Mar 1;62(4):1125-32. (PMID: 16372356)
J Biol Phys. 2021 Dec;47(4):435-454. (PMID: 34751854)
Proteins. 2011 Mar;79(3):830-8. (PMID: 21287615)
Bioinformatics. 2014 Feb 01;30(3):335-42. (PMID: 24281696)
Nat Methods. 2022 Sep;19(9):1109-1115. (PMID: 36038728)
Proc Natl Acad Sci U S A. 2013 Jan 15;110(3):E193-201. (PMID: 23277561)
J Chem Inf Model. 2020 Jun 22;60(6):2773-2790. (PMID: 32250622)
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W306-10. (PMID: 15980478)
معلومات مُعتمدة: HDTRA 12010001 United States Department of Defense | Defense Threat Reduction Agency (DTRA); F-1654 Welch Foundation
المشرفين على المادة: 0 (Proteins)
تواريخ الأحداث: Date Created: 20240723 Date Completed: 20240723 Latest Revision: 20240726
رمز التحديث: 20240726
مُعرف محوري في PubMed: PMC11266546
DOI: 10.1038/s41467-024-49780-2
PMID: 39043654
قاعدة البيانات: MEDLINE
الوصف
تدمد:2041-1723
DOI:10.1038/s41467-024-49780-2