دورية أكاديمية

Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features

التفاصيل البيبلوغرافية
العنوان: Using multi-layer perceptron to identify origins of replication in eukaryotes via informative features
المؤلفون: Yongxian Fan, Wanru Wang
المصدر: BMC Bioinformatics, Vol 22, Iss 1, Pp 1-12 (2021)
بيانات النشر: BMC, 2021.
سنة النشر: 2021
المجموعة: LCC:Computer applications to medicine. Medical informatics
LCC:Biology (General)
مصطلحات موضوعية: Eukaryotes, DNA replication, Origin, TF-IDF, Multi-layer perceptron, STREME, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
الوصف: Abstract Background The origin is the starting site of DNA replication, an extremely vital part of the informational inheritance between parents and children. More importantly, accurately identifying the origin of replication has great application value in the diagnosis and treatment of diseases related to genetic information errors, while the traditional biological experimental methods are time-consuming and laborious. Results We carried out research on the origin of replication in a variety of eukaryotes and proposed a unique prediction method for each species. Throughout the experiment, we collected data from 7 species, including Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Kluyveromyces lactis, Pichia pastoris and Schizosaccharomyces pombe. In addition to the commonly used sequence feature extraction methods PseKNC-II and Base-content, we designed a feature extraction method based on TF-IDF. Then the two-step method was utilized for feature selection. After comparing a variety of traditional machine learning classification models, the multi-layer perceptron was employed as the classification algorithm. Ultimately, the data and codes involved in the experiment are available at https://github.com/Sarahyouzi/EukOriginPredict . Conclusions The prediction accuracy of the training set of the above-mentioned seven species after 100 times fivefold cross validation reach 92.60%, 90.80%, 91.22%, 96.15%, 96.72%, 99.86%, 96.72%, respectively. It denotes that compared with other methods, the methods we designed could accomplish superior performance. In addition, our experiments reveals that the models of multiple species could predict each other with high accuracy, and the results of STREME shows that they have a certain common motif.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1471-2105
Relation: https://doaj.org/toc/1471-2105
DOI: 10.1186/s12859-021-04431-x
URL الوصول: https://doaj.org/article/3a6e7b6f67e24bcc86c188f6b650477d
رقم الأكسشن: edsdoj.3a6e7b6f67e24bcc86c188f6b650477d
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:14712105
DOI:10.1186/s12859-021-04431-x