دورية أكاديمية

Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine.

التفاصيل البيبلوغرافية
العنوان: Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine.
المؤلفون: Mao R; College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China; College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China; Department of Biology, Miami University, Oxford, Ohio, United States of America., Raj Kumar PK; Department of Biology, Miami University, Oxford, Ohio, United States of America., Guo C; Department of Biology, Miami University, Oxford, Ohio, United States of America., Zhang Y; College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China; College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China., Liang C; Department of Biology, Miami University, Oxford, Ohio, United States of America; Department of Computer Sciences and Software Engineering, Miami University, Oxford, Ohio, United States of America.
المصدر: PloS one [PLoS One] 2014 Aug 11; Vol. 9 (8), pp. e104049. Date of Electronic Publication: 2014 Aug 11 (Print Publication: 2014).
نوع المنشور: Comparative Study; Journal Article; Research Support, N.I.H., Extramural; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: Public Library of Science Country of Publication: United States NLM ID: 101285081 Publication Model: eCollection Cited Medium: Internet ISSN: 1932-6203 (Electronic) Linking ISSN: 19326203 NLM ISO Abbreviation: PLoS One Subsets: MEDLINE
أسماء مطبوعة: Original Publication: San Francisco, CA : Public Library of Science
مواضيع طبية MeSH: RNA Splicing* , Support Vector Machine*, Arabidopsis/*genetics , Computational Biology/*methods , Introns/*genetics, Exons/genetics ; RNA, Messenger/genetics ; RNA, Messenger/metabolism
مستخلص: One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter [Formula: see text] in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.
References: Trends Plant Sci. 2012 Oct;17(10):616-23. (PMID: 22743067)
BMC Genomics. 2007 Feb 26;8:59. (PMID: 17324281)
Nucleic Acids Res. 2010 Aug;38(14):4740-54. (PMID: 20385588)
IEEE Trans Med Imaging. 2005 Mar;24(3):371-80. (PMID: 15754987)
J Chem Inf Comput Sci. 2003 Nov-Dec;43(6):2048-56. (PMID: 14632457)
Comput Math Methods Med. 2012;2012:320698. (PMID: 22924059)
Appl Bioinformatics. 2003;2(2):67-77. (PMID: 15130823)
In Silico Biol. 2004;4(4):387-93. (PMID: 15217358)
RNA Biol. 2011 May-Jun;8(3):450-7. (PMID: 21558794)
PLoS Genet. 2006 Apr;2(4):e29. (PMID: 16683024)
Genome Res. 2012 Jun;22(6):1184-95. (PMID: 22391557)
Nat Rev Genet. 2011 Dec 06;13(1):59-69. (PMID: 22143240)
PLoS One. 2013 Jul 26;8(7):e70153. (PMID: 23922946)
Nat Genet. 2008 Dec;40(12):1413-5. (PMID: 18978789)
Nat Rev Mol Cell Biol. 2013 Mar;14(3):153-65. (PMID: 23385723)
Bioinformatics. 2001 Aug;17(8):721-8. (PMID: 11524373)
PLoS Comput Biol. 2008 Aug 08;4(8):e1000147. (PMID: 18688268)
Nucleic Acids Res. 2007;35(1):125-31. (PMID: 17158149)
AMIA Annu Symp Proc. 2007 Oct 11;:686-90. (PMID: 18693924)
RNA. 2011 Jul;17(7):1344-56. (PMID: 21613532)
Plant Physiol. 2013 Jun;162(2):1092-109. (PMID: 23590974)
Gene. 2009 Jul 1;440(1-2):28-41. (PMID: 19341785)
Cell. 2004 Dec 17;119(6):831-45. (PMID: 15607979)
Nature. 2014 Jan 30;505(7485):696-700. (PMID: 24270811)
Proc Natl Acad Sci U S A. 2006 May 2;103(18):7175-80. (PMID: 16632598)
Bioinformatics. 2013 Jun 1;29(11):1361-6. (PMID: 23620357)
BMC Bioinformatics. 2007 May 21;8:159. (PMID: 17517127)
Gene. 2005 Dec 30;364:53-62. (PMID: 16219431)
Cell. 1985 Dec;43(3 Pt 2):667-76. (PMID: 4075405)
J Mol Biol. 2009 Apr 10;387(4):1040-53. (PMID: 19233205)
Science. 2005 Sep 2;309(5740):1559-63. (PMID: 16141072)
Nat Rev Genet. 2010 May;11(5):345-55. (PMID: 20376054)
Nucleic Acids Res. 2007 Jan;35(Database issue):D93-8. (PMID: 17108355)
Planta. 2005 Jul;221(5):705-15. (PMID: 15666155)
BMC Bioinformatics. 2008 Jul 22;9:319. (PMID: 18647401)
DNA Res. 2006 Jun 30;13(3):111-21. (PMID: 16980712)
BMC Bioinformatics. 2006 Jan 06;7:3. (PMID: 16398926)
Plant J. 2007 Mar;49(6):1091-107. (PMID: 17319848)
BMC Genomics. 2006 Dec 28;7:327. (PMID: 17194304)
Bioinformatics. 2005 Mar 1;21(5):631-43. (PMID: 15374862)
Brief Bioinform. 2006 Mar;7(1):55-69. (PMID: 16761365)
Bioinformatics. 2005 May 1;21(9):1859-75. (PMID: 15728110)
Bioinformation. 2013 May 25;9(9):481-4. (PMID: 23847404)
BMC Bioinformatics. 2011 Feb 16;12:55. (PMID: 21324185)
J Theor Biol. 2010 Oct 21;266(4):560-8. (PMID: 20655929)
Science. 2002 Aug 9;297(5583):1007-13. (PMID: 12114529)
Plant J. 2004 Sep;39(6):877-85. (PMID: 15341630)
Genome Res. 2008 Sep;18(9):1381-92. (PMID: 18669480)
PLoS Genet. 2007 May 25;3(5):e85. (PMID: 17530930)
Genome Res. 2010 Jan;20(1):45-58. (PMID: 19858364)
Brief Bioinform. 2006 Mar;7(1):86-112. (PMID: 16761367)
Biochem Biophys Res Commun. 2008 Apr 4;368(2):379-81. (PMID: 18230347)
Proc Natl Acad Sci U S A. 2004 Nov 2;101(44):15700-5. (PMID: 15505203)
معلومات مُعتمدة: R15 GM094732 United States GM NIGMS NIH HHS; 1R15GM094732-01A1 United States GM NIGMS NIH HHS
المشرفين على المادة: 0 (RNA, Messenger)
تواريخ الأحداث: Date Created: 20140812 Date Completed: 20150415 Latest Revision: 20211021
رمز التحديث: 20240628
مُعرف محوري في PubMed: PMC4128822
DOI: 10.1371/journal.pone.0104049
PMID: 25110928
قاعدة البيانات: MEDLINE
الوصف
تدمد:1932-6203
DOI:10.1371/journal.pone.0104049