دورية أكاديمية

Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.

التفاصيل البيبلوغرافية
العنوان: Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.
المؤلفون: Chu H; College of Information Technology, Shanghai Ocean University, Shanghai 201306, China., Liu T; College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
المصدر: International journal of molecular sciences [Int J Mol Sci] 2024 Apr 19; Vol. 25 (8). Date of Electronic Publication: 2024 Apr 19.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: MDPI Country of Publication: Switzerland NLM ID: 101092791 Publication Model: Electronic Cited Medium: Internet ISSN: 1422-0067 (Electronic) Linking ISSN: 14220067 NLM ISO Abbreviation: Int J Mol Sci Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Basel, Switzerland : MDPI, [2000-
مواضيع طبية MeSH: Proteins*/metabolism, Computational Biology/methods ; Drug Discovery/methods ; Position-Specific Scoring Matrices ; Databases, Protein ; Humans ; Algorithms
مستخلص: Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model.
References: Angew Chem Int Ed Engl. 2014 Aug 25;53(35):9128-40. (PMID: 25045053)
J Adv Res. 2022 Nov;41:219-231. (PMID: 36328750)
Nature. 2015 May 28;521(7553):436-44. (PMID: 26017442)
PLoS One. 2012;7(5):e37608. (PMID: 22666371)
Front Physiol. 2015 Dec 08;6:366. (PMID: 26696900)
Nat Rev Drug Discov. 2002 Sep;1(9):727-30. (PMID: 12209152)
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. (PMID: 9254694)
Nucleic Acids Res. 2008 Jan;36(Database issue):D901-6. (PMID: 18048412)
BMC Bioinformatics. 2007 Sep 20;8:353. (PMID: 17883836)
Sci Rep. 2022 Apr 1;12(1):5505. (PMID: 35365726)
Commun Biol. 2022 Nov 24;5(1):1291. (PMID: 36434048)
Science. 2023 Mar 17;379(6637):1123-1130. (PMID: 36927031)
Appl Environ Microbiol. 2007 Aug;73(16):5261-7. (PMID: 17586664)
J Chem Inf Model. 2012 Feb 27;52(2):360-72. (PMID: 22148551)
Nat Genet. 2000 May;25(1):25-9. (PMID: 10802651)
Nat Rev Drug Discov. 2017 Jan;16(1):19-34. (PMID: 27910877)
Drug Discov Today. 2009 Feb;14(3-4):155-61. (PMID: 19041415)
Front Pharmacol. 2022 Jun 29;13:870479. (PMID: 35847005)
Nat Methods. 2019 Dec;16(12):1315-1322. (PMID: 31636460)
Comput Math Methods Med. 2015;2015:674296. (PMID: 26525745)
PLoS Comput Biol. 2008 Oct;4(10):e1000173. (PMID: 18974822)
Biochimie. 2010 Oct;92(10):1330-4. (PMID: 20600567)
Genomics. 2013 Oct;102(4):237-42. (PMID: 23747746)
Fold Des. 1997;2(5):295-306. (PMID: 9377713)
Nucleic Acids Res. 2023 Jan 6;51(D1):D1405-D1416. (PMID: 36624666)
Artif Intell Med. 2019 Jul;98:35-47. (PMID: 31521251)
BioData Min. 2013 Oct 02;6(1):16. (PMID: 24088532)
Nucleic Acids Res. 1999 Jan 1;27(1):49-54. (PMID: 9847139)
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. (PMID: 34232869)
J Cheminform. 2023 Jul 19;15(1):64. (PMID: 37468968)
Bioinformatics. 2009 Feb 15;25(4):451-7. (PMID: 19164304)
J Biomol Struct Dyn. 2023 Oct 18;:1-12. (PMID: 37850427)
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2131-2140. (PMID: 30998478)
Drug Discov Today. 2005 Dec;10(23-24):1675-82. (PMID: 16376828)
Drug Discov Today. 2016 May;21(5):718-24. (PMID: 26821132)
فهرسة مساهمة: Keywords: ESM-2; PSSM; deep learning; druggable protein; machine learning
المشرفين على المادة: 0 (Proteins)
تواريخ الأحداث: Date Created: 20240427 Date Completed: 20240427 Latest Revision: 20240429
رمز التحديث: 20240429
مُعرف محوري في PubMed: PMC11049818
DOI: 10.3390/ijms25084507
PMID: 38674091
قاعدة البيانات: MEDLINE
الوصف
تدمد:1422-0067
DOI:10.3390/ijms25084507