دورية أكاديمية

Protein remote homology detection and structural alignment using deep learning.

التفاصيل البيبلوغرافية
العنوان: Protein remote homology detection and structural alignment using deep learning.
المؤلفون: Hamamsy T; Center for Data Science, New York University, New York, NY, USA., Morton JT; Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.; Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA., Blackwell R; Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA., Berenberg D; Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA.; Prescient Design, New York, NY, USA., Carriero N; Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA., Gligorijevic V; Prescient Design, New York, NY, USA., Strauss CEM; Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA., Leman JK; Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA., Cho K; Center for Data Science, New York University, New York, NY, USA. kc119@nyu.edu.; Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA. kc119@nyu.edu.; Prescient Design, New York, NY, USA. kc119@nyu.edu.; CIFAR, Toronto, Ontario, Canada. kc119@nyu.edu., Bonneau R; Center for Data Science, New York University, New York, NY, USA. bonneaur@gene.com.; Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA. bonneaur@gene.com.; Prescient Design, New York, NY, USA. bonneaur@gene.com.; Department of Biology, New York University, New York, NY, USA. bonneaur@gene.com.
المصدر: Nature biotechnology [Nat Biotechnol] 2024 Jun; Vol. 42 (6), pp. 975-985. Date of Electronic Publication: 2023 Sep 07.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Nature America Publishing Country of Publication: United States NLM ID: 9604648 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1546-1696 (Electronic) Linking ISSN: 10870156 NLM ISO Abbreviation: Nat Biotechnol Subsets: MEDLINE
أسماء مطبوعة: Publication: New York Ny : Nature America Publishing
Original Publication: New York, NY : Nature Pub. Co., [1996-
مواضيع طبية MeSH: Deep Learning* , Proteins*/chemistry , Sequence Alignment*/methods , Databases, Protein*, Sequence Analysis, Protein/methods ; Computational Biology/methods ; Algorithms ; Protein Conformation
مستخلص: Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.
(© 2023. The Author(s).)
التعليقات: Comment in: Nat Genet. 2023 Oct;55(10):1609. doi: 10.1038/s41588-023-01543-3. (PMID: 37816889)
References: Bioinformatics. 2015 Mar 15;31(6):926-32. (PMID: 25398609)
BMC Bioinformatics. 2019 Dec 17;20(1):723. (PMID: 31847804)
J Bioinform Comput Biol. 2003 Apr;1(1):95-117. (PMID: 15290783)
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. (PMID: 34232869)
Nucleic Acids Res. 2018 Jul 2;46(W1):W296-W303. (PMID: 29788355)
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):. (PMID: 33876751)
J Mol Biol. 1990 Oct 5;215(3):403-10. (PMID: 2231712)
Science. 2023 Mar 17;379(6637):1123-1130. (PMID: 36927031)
Nat Methods. 2023 Jan;20(1):104-111. (PMID: 36522501)
Proteins. 2008 Mar;70(4):1162-6. (PMID: 17932926)
Sci Rep. 2017 Nov 2;7(1):14890. (PMID: 29097661)
Nat Commun. 2021 Apr 23;12(1):2403. (PMID: 33893299)
Bioinformatics. 2009 Jun 1;25(11):1422-3. (PMID: 19304878)
Bioinformatics. 2019 Jun 1;35(12):2009-2016. (PMID: 30418485)
BMC Bioinformatics. 2019 Sep 14;20(1):473. (PMID: 31521110)
Nat Biotechnol. 2024 Jun;42(6):975-985. (PMID: 37679542)
Nat Biotechnol. 2017 Nov;35(11):1026-1028. (PMID: 29035372)
Proteins. 2005 Feb 15;58(3):618-27. (PMID: 15609341)
Nucleic Acids Res. 2014 Jan;42(Database issue):D26-31. (PMID: 24225321)
Science. 2021 Aug 20;373(6557):871-876. (PMID: 34282049)
Bioinformatics. 2017 Dec 01;33(23):3749-3757. (PMID: 28961795)
Nat Methods. 2022 Jun;19(6):679-682. (PMID: 35637307)
Nat Biotechnol. 2017 Feb;35(2):128-135. (PMID: 28092658)
Nature. 2021 Nov;599(7883):91-95. (PMID: 34707284)
Genetics. 1996 Mar;142(3):1033-6. (PMID: 8849908)
Nucleic Acids Res. 2021 Jan 8;49(D1):D344-D354. (PMID: 33156333)
BMC Bioinformatics. 2008 Mar 26;9 Suppl 2:S10. (PMID: 18387198)
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30266-30275. (PMID: 33208538)
Nat Commun. 2023 Apr 26;14(1):2351. (PMID: 37100781)
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W29-37. (PMID: 21593126)
Nat Biotechnol. 2024 Feb;42(2):243-246. (PMID: 37156916)
Adv Neural Inf Process Syst. 2019 Dec;32:9689-9701. (PMID: 33390682)
Curr Protoc Bioinformatics. 2006 Jul;Chapter 5:Unit 5.5. (PMID: 18428766)
Nat Commun. 2021 May 26;12(1):3168. (PMID: 34039967)
Elife. 2022 Mar 31;11:. (PMID: 35356891)
Nat Genet. 2000 May;25(1):25-9. (PMID: 10802651)
Protein Eng. 1999 Feb;12(2):85-94. (PMID: 10195279)
Nat Biotechnol. 2022 Nov;40(11):1617-1623. (PMID: 36192636)
Nat Methods. 2020 Mar;17(3):261-272. (PMID: 32015543)
Nat Methods. 2018 Oct;15(10):816-822. (PMID: 30250057)
Protein Sci. 2002 Nov;11(11):2606-21. (PMID: 12381844)
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W448-53. (PMID: 23677608)
Nat Methods. 2019 Dec;16(12):1315-1322. (PMID: 31636460)
Nucleic Acids Res. 2016 Apr 7;44(6):2501-13. (PMID: 26926108)
Nature. 2021 Aug;596(7873):583-589. (PMID: 34265844)
Nat Methods. 2021 Apr;18(4):366-368. (PMID: 33828273)
J Mol Biol. 1985 Dec 5;186(3):651-63. (PMID: 4093982)
Proc Natl Acad Sci U S A. 2016 May 24;113(21):5970-5. (PMID: 27140646)
Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. (PMID: 34791371)
J Mol Biol. 1970 Mar;48(3):443-53. (PMID: 5420325)
Nucleic Acids Res. 2023 Jan 6;51(D1):D753-D759. (PMID: 36477304)
Nucleic Acids Res. 2018 Jan 4;46(D1):D435-D439. (PMID: 29112716)
Nature. 2020 Jan;577(7792):706-710. (PMID: 31942072)
Nucleic Acids Res. 2003 Jan 1;31(1):365-70. (PMID: 12520024)
Genome Biol. 2009 Feb 02;10(2):207. (PMID: 19226439)
Bioinform Adv. 2022 Jan 09;2(1):vbab043. (PMID: 36699409)
Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. (PMID: 18039703)
Bioinformatics. 2013 Nov 1;29(21):2722-8. (PMID: 23986568)
Nucleic Acids Res. 2007 Jan;35(Database issue):D301-3. (PMID: 17142228)
Nucleic Acids Res. 2020 Jan 8;48(D1):D376-D382. (PMID: 31724711)
Proteins. 2018 Mar;86 Suppl 1:7-15. (PMID: 29082672)
Bioinformatics. 2010 Dec 1;26(23):2983-5. (PMID: 20937596)
BMC Bioinformatics. 2015 Nov 11;16:381. (PMID: 26558535)
NAR Genom Bioinform. 2022 Jun 11;4(2):lqac043. (PMID: 35702380)
Nat Commun. 2019 Sep 4;10(1):3977. (PMID: 31484923)
Structure. 1997 Aug 15;5(8):1093-108. (PMID: 9309224)
Nucleic Acids Res. 2021 Jan 8;49(D1):D266-D273. (PMID: 33237325)
Nucleic Acids Res. 2005 Apr 22;33(7):2302-9. (PMID: 15849316)
Bioinformatics. 2014 Feb 15;30(4):559-65. (PMID: 24336411)
Nucleic Acids Res. 2008 Jan;36(Database issue):D211-7. (PMID: 17855399)
Nucleic Acids Res. 2019 Jan 8;47(D1):D309-D314. (PMID: 30418610)
معلومات مُعتمدة: R35GM122515 National Science Foundation (NSF); IOS-1546218 National Science Foundation (NSF); R35 GM122515 United States GM NIGMS NIH HHS; R01 DK103358 United States DK NIDDK NIH HHS; CBET- 1728858 National Science Foundation (NSF); R01 AI130945 United States AI NIAID NIH HHS
المشرفين على المادة: 0 (Proteins)
تواريخ الأحداث: Date Created: 20230907 Date Completed: 20240616 Latest Revision: 20240620
رمز التحديث: 20240620
مُعرف محوري في PubMed: PMC11180608
DOI: 10.1038/s41587-023-01917-2
PMID: 37679542
قاعدة البيانات: MEDLINE
الوصف
تدمد:1546-1696
DOI:10.1038/s41587-023-01917-2