دورية أكاديمية

Identification of Thermophilic Proteins Based on Sequence-Based Bidirectional Representations from Transformer-Embedding Features

التفاصيل البيبلوغرافية
العنوان: Identification of Thermophilic Proteins Based on Sequence-Based Bidirectional Representations from Transformer-Embedding Features
المؤلفون: Hongdi Pei, Jiayu Li, Shuhan Ma, Jici Jiang, Mingxin Li, Quan Zou, Zhibin Lv
المصدر: Applied Sciences, Vol 13, Iss 5, p 2858 (2023)
بيانات النشر: MDPI AG, 2023.
سنة النشر: 2023
المجموعة: LCC:Technology
LCC:Engineering (General). Civil engineering (General)
LCC:Biology (General)
LCC:Physics
LCC:Chemistry
مصطلحات موضوعية: thermophilic proteins, BERT, machine learning, imbalanced dataset, deep learning, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
الوصف: Thermophilic proteins have great potential to be utilized as biocatalysts in biotechnology. Machine learning algorithms are gaining increasing use in identifying such enzymes, reducing or even eliminating the need for experimental studies. While most previously used machine learning methods were based on manually designed features, we developed BertThermo, a model using Bidirectional Encoder Representations from Transformers (BERT), as an automatic feature extraction tool. This method combines a variety of machine learning algorithms and feature engineering methods, while relying on single-feature encoding based on the protein sequence alone for model input. BertThermo achieved an accuracy of 96.97% and 97.51% in 5-fold cross-validation and in independent testing, respectively, identifying thermophilic proteins more reliably than any previously described predictive algorithm. Additionally, BertThermo was tested by a balanced dataset, an imbalanced dataset and a dataset with homology sequences, and the results show that BertThermo was with the best robustness as comparied with state-of-the-art methods. The source code of BertThermo is available.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2076-3417
Relation: https://www.mdpi.com/2076-3417/13/5/2858; https://doaj.org/toc/2076-3417
DOI: 10.3390/app13052858
URL الوصول: https://doaj.org/article/aec2668785a6425fbf53cb813ed51093
رقم الأكسشن: edsdoj.2668785a6425fbf53cb813ed51093
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:20763417
DOI:10.3390/app13052858