دورية أكاديمية
Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.
العنوان: | Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models. |
---|---|
المؤلفون: | Chu H; College of Information Technology, Shanghai Ocean University, Shanghai 201306, China., Liu T; College of Information Technology, Shanghai Ocean University, Shanghai 201306, China. |
المصدر: | International journal of molecular sciences [Int J Mol Sci] 2024 Apr 19; Vol. 25 (8). Date of Electronic Publication: 2024 Apr 19. |
نوع المنشور: | Journal Article |
اللغة: | English |
بيانات الدورية: | Publisher: MDPI Country of Publication: Switzerland NLM ID: 101092791 Publication Model: Electronic Cited Medium: Internet ISSN: 1422-0067 (Electronic) Linking ISSN: 14220067 NLM ISO Abbreviation: Int J Mol Sci Subsets: MEDLINE |
أسماء مطبوعة: | Original Publication: Basel, Switzerland : MDPI, [2000- |
مواضيع طبية MeSH: | Proteins*/metabolism, Computational Biology/methods ; Drug Discovery/methods ; Position-Specific Scoring Matrices ; Databases, Protein ; Humans ; Algorithms |
مستخلص: | Identification of druggable proteins can greatly reduce the cost of discovering new potential drugs. Traditional experimental approaches to exploring these proteins are often costly, slow, and labor-intensive, making them impractical for large-scale research. In response, recent decades have seen a rise in computational methods. These alternatives support drug discovery by creating advanced predictive models. In this study, we proposed a fast and precise classifier for the identification of druggable proteins using a protein language model (PLM) with fine-tuned evolutionary scale modeling 2 (ESM-2) embeddings, achieving 95.11% accuracy on the benchmark dataset. Furthermore, we made a careful comparison to examine the predictive abilities of ESM-2 embeddings and position-specific scoring matrix (PSSM) features by using the same classifiers. The results suggest that ESM-2 embeddings outperformed PSSM features in terms of accuracy and efficiency. Recognizing the potential of language models, we also developed an end-to-end model based on the generative pre-trained transformers 2 (GPT-2) with modifications. To our knowledge, this is the first time a large language model (LLM) GPT-2 has been deployed for the recognition of druggable proteins. Additionally, a more up-to-date dataset, known as Pharos, was adopted to further validate the performance of the proposed model. |
References: | Angew Chem Int Ed Engl. 2014 Aug 25;53(35):9128-40. (PMID: 25045053) J Adv Res. 2022 Nov;41:219-231. (PMID: 36328750) Nature. 2015 May 28;521(7553):436-44. (PMID: 26017442) PLoS One. 2012;7(5):e37608. (PMID: 22666371) Front Physiol. 2015 Dec 08;6:366. (PMID: 26696900) Nat Rev Drug Discov. 2002 Sep;1(9):727-30. (PMID: 12209152) Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. (PMID: 9254694) Nucleic Acids Res. 2008 Jan;36(Database issue):D901-6. (PMID: 18048412) BMC Bioinformatics. 2007 Sep 20;8:353. (PMID: 17883836) Sci Rep. 2022 Apr 1;12(1):5505. (PMID: 35365726) Commun Biol. 2022 Nov 24;5(1):1291. (PMID: 36434048) Science. 2023 Mar 17;379(6637):1123-1130. (PMID: 36927031) Appl Environ Microbiol. 2007 Aug;73(16):5261-7. (PMID: 17586664) J Chem Inf Model. 2012 Feb 27;52(2):360-72. (PMID: 22148551) Nat Genet. 2000 May;25(1):25-9. (PMID: 10802651) Nat Rev Drug Discov. 2017 Jan;16(1):19-34. (PMID: 27910877) Drug Discov Today. 2009 Feb;14(3-4):155-61. (PMID: 19041415) Front Pharmacol. 2022 Jun 29;13:870479. (PMID: 35847005) Nat Methods. 2019 Dec;16(12):1315-1322. (PMID: 31636460) Comput Math Methods Med. 2015;2015:674296. (PMID: 26525745) PLoS Comput Biol. 2008 Oct;4(10):e1000173. (PMID: 18974822) Biochimie. 2010 Oct;92(10):1330-4. (PMID: 20600567) Genomics. 2013 Oct;102(4):237-42. (PMID: 23747746) Fold Des. 1997;2(5):295-306. (PMID: 9377713) Nucleic Acids Res. 2023 Jan 6;51(D1):D1405-D1416. (PMID: 36624666) Artif Intell Med. 2019 Jul;98:35-47. (PMID: 31521251) BioData Min. 2013 Oct 02;6(1):16. (PMID: 24088532) Nucleic Acids Res. 1999 Jan 1;27(1):49-54. (PMID: 9847139) IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. (PMID: 34232869) J Cheminform. 2023 Jul 19;15(1):64. (PMID: 37468968) Bioinformatics. 2009 Feb 15;25(4):451-7. (PMID: 19164304) J Biomol Struct Dyn. 2023 Oct 18;:1-12. (PMID: 37850427) IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2131-2140. (PMID: 30998478) Drug Discov Today. 2005 Dec;10(23-24):1675-82. (PMID: 16376828) Drug Discov Today. 2016 May;21(5):718-24. (PMID: 26821132) |
فهرسة مساهمة: | Keywords: ESM-2; PSSM; deep learning; druggable protein; machine learning |
المشرفين على المادة: | 0 (Proteins) |
تواريخ الأحداث: | Date Created: 20240427 Date Completed: 20240427 Latest Revision: 20240429 |
رمز التحديث: | 20240429 |
مُعرف محوري في PubMed: | PMC11049818 |
DOI: | 10.3390/ijms25084507 |
PMID: | 38674091 |
قاعدة البيانات: | MEDLINE |
تدمد: | 1422-0067 |
---|---|
DOI: | 10.3390/ijms25084507 |