دورية أكاديمية

Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.

التفاصيل البيبلوغرافية
العنوان: Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.
المؤلفون: Wei L; School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China., Luan S; School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China., Nagai LAE; Lab of Functional Analysis In Silico, Institute of Medical Science, University of Tokyo, Tokyo, Japan., Su R; School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin, China., Zou Q; School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
المصدر: Bioinformatics (Oxford, England) [Bioinformatics] 2019 Apr 15; Vol. 35 (8), pp. 1326-1333.
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: Oxford University Press Country of Publication: England NLM ID: 9808944 Publication Model: Print Cited Medium: Internet ISSN: 1367-4811 (Electronic) Linking ISSN: 13674803 NLM ISO Abbreviation: Bioinformatics Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Oxford : Oxford University Press, c1998-
مواضيع طبية MeSH: Support Vector Machine*, DNA/*genetics, Algorithms ; Genome ; Machine Learning
مستخلص: Motivation: As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction-modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites.
Results: In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites.
Availability and Implementation: The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM.
Supplementary Information: Supplementary data are available at Bioinformatics online.
(© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
المشرفين على المادة: 9007-49-2 (DNA)
تواريخ الأحداث: Date Created: 20180922 Date Completed: 20200218 Latest Revision: 20220331
رمز التحديث: 20231215
DOI: 10.1093/bioinformatics/bty824
PMID: 30239627
قاعدة البيانات: MEDLINE
الوصف
تدمد:1367-4811
DOI:10.1093/bioinformatics/bty824