دورية أكاديمية

GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.

التفاصيل البيبلوغرافية
العنوان: GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.
المؤلفون: Brůna T; School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, USA., Lomsadze A; Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA., Borodovsky M; School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, USA; borodovsky@gatech.edu.; Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.; School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.
المصدر: Genome research [Genome Res] 2024 Jun 25; Vol. 34 (5), pp. 757-768. Date of Electronic Publication: 2024 Jun 25.
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't; Research Support, N.I.H., Extramural
اللغة: English
بيانات الدورية: Publisher: Cold Spring Harbor Laboratory Press Country of Publication: United States NLM ID: 9518021 Publication Model: Electronic Cited Medium: Internet ISSN: 1549-5469 (Electronic) Linking ISSN: 10889051 NLM ISO Abbreviation: Genome Res Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Cold Spring Harbor, N.Y. : Cold Spring Harbor Laboratory Press, c1995-
مواضيع طبية MeSH: Molecular Sequence Annotation*/methods, Animals ; Software ; Genome ; Genomics/methods ; Eukaryota/genetics ; Algorithms
مستخلص: Large-scale genomic initiatives, such as the Earth BioGenome Project, require efficient methods for eukaryotic genome annotation. Here we present an automatic gene finder, GeneMark-ETP, integrating genomic-, transcriptomic-, and protein-derived evidence that has been developed with a focus on large plant and animal genomes. GeneMark-ETP first identifies genomic loci where extrinsic data are sufficient for making gene predictions with "high confidence." The genes situated in the genomic space between the high-confidence genes are predicted in the next stage. The set of high-confidence genes serves as an initial training set for the statistical model. Further on, the model parameters are iteratively updated in the rounds of gene prediction and parameter re-estimation. Upon reaching convergence, GeneMark-ETP makes the final predictions and delivers the whole complement of predicted genes. GeneMark-ETP outperforms gene finders using a single type of extrinsic evidence. Comparisons with gene finders MAKER2 and TSEBRA, those that use both transcript- and protein-derived extrinsic evidence, show that GeneMark-ETP delivers state-of-the-art gene-prediction accuracy, with the margin of outperforming existing approaches increasing in its application to larger and more complex eukaryotic genomes.
(© 2024 Brůna et al.; Published by Cold Spring Harbor Laboratory Press.)
التعليقات: Update of: bioRxiv. 2024 Apr 17:2023.01.13.524024. doi: 10.1101/2023.01.13.524024. (PMID: 36711453)
References: Nucleic Acids Res. 2019 Jan 8;47(D1):D807-D811. (PMID: 30395283)
Bioinformatics. 2005 Sep 15;21(18):3596-603. (PMID: 16076884)
BMC Bioinformatics. 2011 Dec 22;12:491. (PMID: 22192575)
Genome Res. 2004 Jan;14(1):142-8. (PMID: 14707176)
BMC Genomics. 2015 Feb 26;16:134. (PMID: 25766582)
NAR Genom Bioinform. 2021 Jan 06;3(1):lqaa108. (PMID: 33575650)
Bioinformatics. 2008 Mar 1;24(5):597-605. (PMID: 18187439)
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4):. (PMID: 35042800)
Bioinformatics. 2016 Mar 1;32(5):767-9. (PMID: 26559507)
BMC Bioinformatics. 2021 Apr 20;22(1):205. (PMID: 33879057)
Nat Biotechnol. 2010 May;28(5):511-5. (PMID: 20436464)
Bioinformatics. 2008 Mar 1;24(5):637-44. (PMID: 18218656)
BMC Bioinformatics. 2005 Feb 15;6:31. (PMID: 15713233)
PLoS One. 2012;7(11):e50609. (PMID: 23226328)
BMC Bioinformatics. 2018 May 30;19(1):189. (PMID: 29843602)
Nat Biotechnol. 2019 Aug;37(8):907-915. (PMID: 31375807)
Genome Res. 2002 Sep;12(9):1418-27. (PMID: 12213779)
Plant Physiol. 2019 Jan;179(1):38-54. (PMID: 30401722)
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654. (PMID: 34320186)
Nucleic Acids Res. 2005 Nov 28;33(20):6494-506. (PMID: 16314312)
Proc Int Conf Intell Syst Mol Biol. 1996;4:134-42. (PMID: 8877513)
Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9451-9457. (PMID: 32300014)
Nat Biotechnol. 2015 Mar;33(3):290-5. (PMID: 25690850)
Nucleic Acids Res. 2015 Jul 13;43(12):e78. (PMID: 25870408)
Genome Res. 2000 Apr;10(4):511-5. (PMID: 10779490)
Nat Commun. 2019 Nov 1;10(1):5000. (PMID: 31676772)
Genome Biol. 2019 Dec 16;20(1):278. (PMID: 31842956)
Genome Res. 2008 Dec;18(12):1979-90. (PMID: 18757608)
Nucleic Acids Res. 2019 Dec 2;47(21):10994-11006. (PMID: 31584084)
NAR Genom Bioinform. 2020 Jun;2(2):lqaa026. (PMID: 32440658)
Nat Methods. 2015 Jan;12(1):59-60. (PMID: 25402007)
Nucleic Acids Res. 2014 Sep;42(15):e119. (PMID: 24990371)
J Mol Biol. 1997 Apr 25;268(1):78-94. (PMID: 9149143)
Nat Plants. 2018 Oct;4(10):762-765. (PMID: 30287950)
Bioinformatics. 2023 Oct 3;39(10):. (PMID: 37758247)
Genome Res. 2024 Jun 25;34(5):769-777. (PMID: 38866550)
Genome Biol. 2006;7 Suppl 1:S2.1-31. (PMID: 16925836)
BMC Bioinformatics. 2021 Nov 25;22(1):566. (PMID: 34823473)
Genome Biol. 2008 Jan 11;9(1):R7. (PMID: 18190707)
BMC Genomics. 2020 Apr 9;21(1):293. (PMID: 32272892)
Biomed Res Int. 2019 Jun 26;2019:4767354. (PMID: 31346518)
Nat Rev Genet. 2012 Apr 18;13(5):329-42. (PMID: 22510764)
BMC Bioinformatics. 2008 Dec 19;9:549. (PMID: 19099578)
معلومات مُعتمدة: R01 GM128145 United States GM NIGMS NIH HHS
تواريخ الأحداث: Date Created: 20240612 Date Completed: 20240625 Latest Revision: 20240709
رمز التحديث: 20240709
مُعرف محوري في PubMed: PMC11216313
DOI: 10.1101/gr.278373.123
PMID: 38866548
قاعدة البيانات: MEDLINE
الوصف
تدمد:1549-5469
DOI:10.1101/gr.278373.123