دورية أكاديمية
GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.
العنوان: | GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes. |
---|---|
المؤلفون: | Brůna T; School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, USA., Lomsadze A; Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA., Borodovsky M; School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, USA; borodovsky@gatech.edu.; Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.; School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA. |
المصدر: | Genome research [Genome Res] 2024 Jun 25; Vol. 34 (5), pp. 757-768. Date of Electronic Publication: 2024 Jun 25. |
نوع المنشور: | Journal Article; Research Support, Non-U.S. Gov't; Research Support, N.I.H., Extramural |
اللغة: | English |
بيانات الدورية: | Publisher: Cold Spring Harbor Laboratory Press Country of Publication: United States NLM ID: 9518021 Publication Model: Electronic Cited Medium: Internet ISSN: 1549-5469 (Electronic) Linking ISSN: 10889051 NLM ISO Abbreviation: Genome Res Subsets: MEDLINE |
أسماء مطبوعة: | Original Publication: Cold Spring Harbor, N.Y. : Cold Spring Harbor Laboratory Press, c1995- |
مواضيع طبية MeSH: | Molecular Sequence Annotation*/methods, Animals ; Software ; Genome ; Genomics/methods ; Eukaryota/genetics ; Algorithms |
مستخلص: | Large-scale genomic initiatives, such as the Earth BioGenome Project, require efficient methods for eukaryotic genome annotation. Here we present an automatic gene finder, GeneMark-ETP, integrating genomic-, transcriptomic-, and protein-derived evidence that has been developed with a focus on large plant and animal genomes. GeneMark-ETP first identifies genomic loci where extrinsic data are sufficient for making gene predictions with "high confidence." The genes situated in the genomic space between the high-confidence genes are predicted in the next stage. The set of high-confidence genes serves as an initial training set for the statistical model. Further on, the model parameters are iteratively updated in the rounds of gene prediction and parameter re-estimation. Upon reaching convergence, GeneMark-ETP makes the final predictions and delivers the whole complement of predicted genes. GeneMark-ETP outperforms gene finders using a single type of extrinsic evidence. Comparisons with gene finders MAKER2 and TSEBRA, those that use both transcript- and protein-derived extrinsic evidence, show that GeneMark-ETP delivers state-of-the-art gene-prediction accuracy, with the margin of outperforming existing approaches increasing in its application to larger and more complex eukaryotic genomes. (© 2024 Brůna et al.; Published by Cold Spring Harbor Laboratory Press.) |
التعليقات: | Update of: bioRxiv. 2024 Apr 17:2023.01.13.524024. doi: 10.1101/2023.01.13.524024. (PMID: 36711453) |
References: | Nucleic Acids Res. 2019 Jan 8;47(D1):D807-D811. (PMID: 30395283) Bioinformatics. 2005 Sep 15;21(18):3596-603. (PMID: 16076884) BMC Bioinformatics. 2011 Dec 22;12:491. (PMID: 22192575) Genome Res. 2004 Jan;14(1):142-8. (PMID: 14707176) BMC Genomics. 2015 Feb 26;16:134. (PMID: 25766582) NAR Genom Bioinform. 2021 Jan 06;3(1):lqaa108. (PMID: 33575650) Bioinformatics. 2008 Mar 1;24(5):597-605. (PMID: 18187439) Proc Natl Acad Sci U S A. 2022 Jan 25;119(4):. (PMID: 35042800) Bioinformatics. 2016 Mar 1;32(5):767-9. (PMID: 26559507) BMC Bioinformatics. 2021 Apr 20;22(1):205. (PMID: 33879057) Nat Biotechnol. 2010 May;28(5):511-5. (PMID: 20436464) Bioinformatics. 2008 Mar 1;24(5):637-44. (PMID: 18218656) BMC Bioinformatics. 2005 Feb 15;6:31. (PMID: 15713233) PLoS One. 2012;7(11):e50609. (PMID: 23226328) BMC Bioinformatics. 2018 May 30;19(1):189. (PMID: 29843602) Nat Biotechnol. 2019 Aug;37(8):907-915. (PMID: 31375807) Genome Res. 2002 Sep;12(9):1418-27. (PMID: 12213779) Plant Physiol. 2019 Jan;179(1):38-54. (PMID: 30401722) Mol Biol Evol. 2021 Sep 27;38(10):4647-4654. (PMID: 34320186) Nucleic Acids Res. 2005 Nov 28;33(20):6494-506. (PMID: 16314312) Proc Int Conf Intell Syst Mol Biol. 1996;4:134-42. (PMID: 8877513) Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9451-9457. (PMID: 32300014) Nat Biotechnol. 2015 Mar;33(3):290-5. (PMID: 25690850) Nucleic Acids Res. 2015 Jul 13;43(12):e78. (PMID: 25870408) Genome Res. 2000 Apr;10(4):511-5. (PMID: 10779490) Nat Commun. 2019 Nov 1;10(1):5000. (PMID: 31676772) Genome Biol. 2019 Dec 16;20(1):278. (PMID: 31842956) Genome Res. 2008 Dec;18(12):1979-90. (PMID: 18757608) Nucleic Acids Res. 2019 Dec 2;47(21):10994-11006. (PMID: 31584084) NAR Genom Bioinform. 2020 Jun;2(2):lqaa026. (PMID: 32440658) Nat Methods. 2015 Jan;12(1):59-60. (PMID: 25402007) Nucleic Acids Res. 2014 Sep;42(15):e119. (PMID: 24990371) J Mol Biol. 1997 Apr 25;268(1):78-94. (PMID: 9149143) Nat Plants. 2018 Oct;4(10):762-765. (PMID: 30287950) Bioinformatics. 2023 Oct 3;39(10):. (PMID: 37758247) Genome Res. 2024 Jun 25;34(5):769-777. (PMID: 38866550) Genome Biol. 2006;7 Suppl 1:S2.1-31. (PMID: 16925836) BMC Bioinformatics. 2021 Nov 25;22(1):566. (PMID: 34823473) Genome Biol. 2008 Jan 11;9(1):R7. (PMID: 18190707) BMC Genomics. 2020 Apr 9;21(1):293. (PMID: 32272892) Biomed Res Int. 2019 Jun 26;2019:4767354. (PMID: 31346518) Nat Rev Genet. 2012 Apr 18;13(5):329-42. (PMID: 22510764) BMC Bioinformatics. 2008 Dec 19;9:549. (PMID: 19099578) |
معلومات مُعتمدة: | R01 GM128145 United States GM NIGMS NIH HHS |
تواريخ الأحداث: | Date Created: 20240612 Date Completed: 20240625 Latest Revision: 20240709 |
رمز التحديث: | 20240709 |
مُعرف محوري في PubMed: | PMC11216313 |
DOI: | 10.1101/gr.278373.123 |
PMID: | 38866548 |
قاعدة البيانات: | MEDLINE |
تدمد: | 1549-5469 |
---|---|
DOI: | 10.1101/gr.278373.123 |