دورية أكاديمية

Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.

التفاصيل البيبلوغرافية
العنوان: Differentially used codons among essential genes in bacteria identified by machine learning-based analysis.
المؤلفون: Kurmi A; Department of Computer Science and Engineering, Tezpur University, Napaam, Assam, 784028, India.; Department of Computer Science and Engineering, The Assam Kaziranga University, Jorhat, Assam, 785006, India., Sen P; Department of Computer Science and Engineering, Tezpur University, Napaam, Assam, 784028, India., Dash M; Department of Electronics and Communication Engineering, NIT, Jote, Arunachal Pradesh, 791113, India., Ray SK; Department of Molecular Biology and Biotechnology, Tezpur University, Napaam, Assam, 784028, India., Satapathy SS; Department of Computer Science and Engineering, Tezpur University, Napaam, Assam, 784028, India. ssankar@tezu.ernet.in.
المصدر: Molecular genetics and genomics : MGG [Mol Genet Genomics] 2024 Jul 27; Vol. 299 (1), pp. 72. Date of Electronic Publication: 2024 Jul 27.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Springer-Verlag Country of Publication: Germany NLM ID: 101093320 Publication Model: Electronic Cited Medium: Internet ISSN: 1617-4623 (Electronic) Linking ISSN: 16174623 NLM ISO Abbreviation: Mol Genet Genomics Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Berlin : Springer-Verlag, c2001-
مواضيع طبية MeSH: Machine Learning* , Genes, Essential*/genetics , Codon Usage* , Escherichia coli*/genetics, Genome, Bacterial/genetics ; Genes, Bacterial ; Codon/genetics ; Bacteria/genetics ; Bacteria/classification
مستخلص: Codon usage bias (CUB), the uneven usage of synonymous codons encoding the same amino acid, differs among genes within and across bacteria genomes. CUB is known to be influenced by gene expression and accordingly, CUB differs between the high-expression and low-expression genes in several bacteria. In this article, we have extended codon usage study considering gene essentiality as a feature. Using machine learning (ML) based approaches, we have analysed Relative Synonymous Codon Usage (RSCU) values between essential and non-essential genes in Escherichia coli and thirty-four other bacterial genomes whose gene essentiality features were available in public databases. We observed significant differences in codon usage patterns between essential and non-essential genes for majority of the bacterial genomes and accordingly, ML based classifiers achieved high area under curve (AUC) scores, with a minimum score of 70.0 across twenty-eight organisms. Further, importance of the codons towards classifying genes found to differ among the codons in each genome. Arg codon CGT and Gly codon GGT were observed to be the most preferred codons among essential genes in Escherichia coli. Interestingly, some of the codons like CGT, ATA, GGT and GGG observed to be contributing consistently towards classifying essential genes across thirty-five bacteria genomes studied. In other hand, codons TGY and CAY encoding amino acids Cys and His respectively were among the least contributing codons towards classification among all these bacteria. This study demonstrates the gene essentiality based differences in synonymous codon usage in bacteria genomes and presents a common codon usage pattern across bacteria.
(© 2024. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.)
References: Acevedo-Rocha CG, Fang G, Schmidt M, Ussery DW, Danchin A (2013) From essential to persistent genes: a functional approach to constructing synthetic life. Trends Genet 29:273–279. https://doi.org/10.1016/j.tig.2012.11.001. (PMID: 10.1016/j.tig.2012.11.001232193433642372)
Akashi H (1994) Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927–935. https://doi.org/10.1093/genetics/136.3.927. (PMID: 10.1093/genetics/136.3.92780054451205897)
Akerley BJ, Rubin EJ, Novick VL, Amaya K, Judson N, Mekalanos JJ (2002) A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae. Proc Natl Acad Sci USA 99:966–971. https://doi.org/10.1073/pnas.012602299. (PMID: 10.1073/pnas.01260229911805338117414)
Aromolaran O, Beder T, Oswald M, Oyelade J, Adebiyi E, Koenig R (2020) Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features. Comput Struct Biotechnol J 18:612–621. https://doi.org/10.1016/j.csbj.2020.02.022. (PMID: 10.1016/j.csbj.2020.02.022322570457096750)
Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H (2006) Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol Syst Biol. https://doi.org/10.1038/msb4100050. (PMID: 10.1038/msb4100050167385541681482)
Baek S, Sung K-M (2000) Fast K-nearest-neighbour search algorithm for nonparametric classification. Electron Lett 36:1821. https://doi.org/10.1049/el:20001249. (PMID: 10.1049/el:20001249)
Bergmiller T, Ackermann M, Silander OK (2012) Patterns of evolutionary conservation of essential genes correlate with their compensability. PLoS Genet 8:e1002803. https://doi.org/10.1371/journal.pgen.1002803. (PMID: 10.1371/journal.pgen.1002803227615963386227)
Breiman L (2001) No title found. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324. (PMID: 10.1023/A:1010933404324)
Campos TL, Korhonen PK, Gasser RB, Young ND (2019) An evaluation of machine learning approaches for the prediction of essential genes in eukaryotes using protein sequence-derived features. Comput Struct Biotechnol J 17:785–796. https://doi.org/10.1016/j.csbj.2019.05.008. (PMID: 10.1016/j.csbj.2019.05.008313124166607062)
Chalker AF, Lunsford RD (2002) Rational identification of new antibacterial drug targets that are essential for viability using a genomics-based approach. Pharmacol Ther 95:1–20. https://doi.org/10.1016/S0163-7258(02)00222-X. (PMID: 10.1016/S0163-7258(02)00222-X12163125)
Chung B, Lee D-Y (2012) Computational codon optimization of synthetic gene for protein expression. BMC Syst Biol 6:134. https://doi.org/10.1186/1752-0509-6-134. (PMID: 10.1186/1752-0509-6-134230831003495653)
Dos Reis M, Wernisch L (2009) Estimating translational selection in eukaryotic genomes. Mol Biol Evol 26:451–461. https://doi.org/10.1093/molbev/msn272. (PMID: 10.1093/molbev/msn27219033257)
Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134:341–352. https://doi.org/10.1016/j.cell.2008.05.042. (PMID: 10.1016/j.cell.2008.05.042186625482696314)
Forsyth RA, Haselbeck RJ, Ohlsen KL, Yamamoto RT, Xu H, Trawick JD, Wall D, Wang L, Brown-Driver V, Froelich JM, King P, McCarthy M, Malone C, Misiner B, Robbins D, Tan Z, Zhu Z, Carr G, Mosca DA, Zamudio C, Foulkes JG, Zyskind JW (2002) A genome-wide strategy for the identification of essential genes in Staphylococcus aureus. Mol Microbiol 43:1387–1400. https://doi.org/10.1046/j.1365-2958.2002.02832.x. (PMID: 10.1046/j.1365-2958.2002.02832.x11952893)
Frank AC, Lobry JR (1999) Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms. Gene 238:65–77. https://doi.org/10.1016/S0378-1119(99)00297-8. (PMID: 10.1016/S0378-1119(99)00297-810570985)
Gerdes SY, Scholle MD, Campbell JW, Balázsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, Bhattacharya A, Kapatral V, D’Souza M, Baev MV, Grechkin Y, Mseeh F, Fonstein MY, Overbeek R, Barabási A-L, Oltvai ZN, Osterman AL (2003) Experimental Determination and System Level Analysis of Essential Genes in Escherichia coli MG1655. J Bacteriol 185:5673–5684. https://doi.org/10.1128/JB.185.19.5673-5684.2003. (PMID: 10.1128/JB.185.19.5673-5684.200313129938193955)
Giaever G, Chu AM, Ni L, Connelly C, Riles L, Véronneau S, Dow S, Lucau-Danila A, Anderson K, André B, Arkin AP, Astromoff A, El Bakkoury M, Bangham R, Benito R, Brachat S, Campanaro S, Curtiss M, Davis K, Deutschbauer A, Entian K-D, Flaherty P, Foury F, Garfinkel DJ, Gerstein M, Gotte D, Güldener U, Hegemann JH, Hempel S, Herman Z, Jaramillo DF, Kelly DE, Kelly SL, Kötter P, LaBonte D, Lamb DC, Lan N, Liang H, Liao H, Liu L, Luo C, Lussier M, Mao R, Menard P, Ooi SL, Revuelta JL, Roberts CJ, Rose M, Ross-Macdonald P, Scherens B, Schimmack G, Shafer B, Shoemaker DD, Sookhai-Mahadeo S, Storms RK, Strathern JN, Valle G, Voet M, Volckaert G, Wang C, Ward TR, Wilhelmy J, Winzeler EA, Yang Y, Yen G, Youngman E, Yu K, Bussey H, Boeke JD, Snyder M, Philippsen P, Davis RW, Johnston M (2002) Functional profiling of the Saccharomyces cerevisiae genome. Nature 418:387–391. https://doi.org/10.1038/nature00935. (PMID: 10.1038/nature0093512140549)
Gingold H, Pilpel Y (2011) Determinants of translation efficiency and accuracy. Mol Syst Biol 7:481. https://doi.org/10.1038/msb.2011.14. (PMID: 10.1038/msb.2011.14214874003101949)
Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom Intell Lab Syst 83:83–90. https://doi.org/10.1016/j.chemolab.2006.01.007. (PMID: 10.1016/j.chemolab.2006.01.007)
Grosjean H, Fiers W (1982) Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene 18:199–209. https://doi.org/10.1016/0378-1119(82)90157-3. (PMID: 10.1016/0378-1119(82)90157-36751939)
Grosjean H, de Crécy-Lagard V, Marck C (2010) Deciphering synonymous codons in the three domains of life: co-evolution with specific tRNA modification enzymes. FEBS Lett 584:252–264. https://doi.org/10.1016/j.febslet.2009.11.052. (PMID: 10.1016/j.febslet.2009.11.05219931533)
Hamese S, Mugwanda K, Takundwa M, Prinsloo E, Raj TG, D.B., (2023) Recent advances in genome annotation and synthetic biology for the development of microbial chassis. J Genetic Eng Biotechnol 21:156. https://doi.org/10.1186/s43141-023-00598-3. (PMID: 10.1186/s43141-023-00598-3)
Hershberg R, Petrov DA (2008) Selection on Codon Bias. Annu Rev Genet 42:287–299. https://doi.org/10.1146/annurev.genet.42.110807.091442. (PMID: 10.1146/annurev.genet.42.110807.09144218983258)
Hirsh AE, Fraser HB (2003) Rate of evolution and gene dispensability. Nature 421:497–498. https://doi.org/10.1038/421497a. (PMID: 10.1038/421497a)
Hutchison CA, Peterson SN, Gill SR, Cline RT, White O, Fraser CM, Smith HO, Craig Venter J (1999) Global transposon mutagenesis and a minimal mycoplasma genome. Science 286:2165–2169. https://doi.org/10.1126/science.286.5447.2165. (PMID: 10.1126/science.286.5447.216510591650)
Ikemura T (1981) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol 151:389–409. https://doi.org/10.1016/0022-2836(81)90003-6. (PMID: 10.1016/0022-2836(81)90003-66175758)
Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12:962–968. https://doi.org/10.1101/gr.87702. (PMID: 10.1101/gr.87702120451491383730)
Juhas M, Eberl L, Glass JI (2011) Essence of life: essential genes of minimal genomes. Trends Cell Biol 21:562–568. https://doi.org/10.1016/j.tcb.2011.07.005. (PMID: 10.1016/j.tcb.2011.07.00521889892)
Juhas M, Eberl L, Church GM (2012) Essential genes as antimicrobial targets and cornerstones of synthetic biology. Trends Biotechnol 30:601–607. https://doi.org/10.1016/j.tibtech.2012.08.002. (PMID: 10.1016/j.tibtech.2012.08.00222951051)
Karbalaei M, Rezaee SA, Farsiani H (2020) Pichia pastoris : a highly successful expression system for optimal synthesis of heterologous proteins. J Cell Physiol 235:5867–5881. https://doi.org/10.1002/jcp.29583. (PMID: 10.1002/jcp.29583320571117228273)
Kobayashi K, Ehrlich SD, Albertini A, Amati G, Andersen KK, Arnaud M, Asai K, Ashikaga S, Aymerich S, Bessieres P, Boland F, Brignell SC, Bron S, Bunai K, Chapuis J, Christiansen LC, Danchin A, Débarbouillé M, Dervyn E, Deuerling E, Devine K, Devine SK, Dreesen O, Errington J, Fillinger S, Foster SJ, Fujita Y, Galizzi A, Gardan R, Eschevins C, Fukushima T, Haga K, Harwood CR, Hecker M, Hosoya D, Hullo MF, Kakeshita H, Karamata D, Kasahara Y, Kawamura F, Koga K, Koski P, Kuwana R, Imamura D, Ishimaru M, Ishikawa S, Ishio I, Le Coq D, Masson A, Mauël C, Meima R, Mellado RP, Moir A, Moriya S, Nagakawa E, Nanamiya H, Nakai S, Nygaard P, Ogura M, Ohanan T, O’Reilly M, O’Rourke M, Pragai Z, Pooley HM, Rapoport G, Rawlins JP, Rivas LA, Rivolta C, Sadaie A, Sadaie Y, Sarvas M, Sato T, Saxild HH, Scanlan E, Schumann W, Seegers JFML, Sekiguchi J, Sekowska A, Séror SJ, Simon M, Stragier P, Studer R, Takamatsu H, Tanaka T, Takeuchi M, Thomaides HB, Vagner V, van Dijl JM, Watabe K, Wipat A, Yamamoto H, Yamamoto M, Yamamoto Y, Yamane K, Yata K, Yoshida K, Yoshikawa H, Zuber U, Ogasawara N (2003) Essential Bacillus subtilis genes. Proc Natl Acad Sci USA 100:4678–4683. https://doi.org/10.1073/pnas.0730515100. (PMID: 10.1073/pnas.073051510012682299153615)
Koonin EV (2003) Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol 1:127–136. https://doi.org/10.1038/nrmicro751. (PMID: 10.1038/nrmicro75115035042)
Kurmi A, Sen P, Dash M, Patra AK, Ray SK, Satapathy SS (2023) Prediction of essential genes using single nucleotide compositional features in genomes of bacteria: a machine learning-based analysis. IJBRA 19:1–18. https://doi.org/10.1504/IJBRA.2023.131276. (PMID: 10.1504/IJBRA.2023.131276)
Kursa MB, Rudnicki WR (2010) Feature selection with the boruta package. J Stat Soft. https://doi.org/10.18637/jss.v036.i11. (PMID: 10.18637/jss.v036.i11)
Lai H-Y, Yu Y-H, Jhou Y-T, Liao C-W, Leu J-Y (2023) Multiple intermolecular interactions facilitate rapid evolution of essential genes. Nat Ecol Evol 7:745–755. https://doi.org/10.1038/s41559-023-02029-5. (PMID: 10.1038/s41559-023-02029-53699773710172115)
Maniloff J (1996) The minimal cell genome: “on being the right size.” Proc Natl Acad Sci USA 93:10004–10006. https://doi.org/10.1073/pnas.93.19.10004. (PMID: 10.1073/pnas.93.19.10004881673838325)
Moger-Reischer RZ, Glass JI, Wise KS, Sun L, Bittencourt DMC, Lehmkuhl BK, Schoolmaster DR, Lynch M, Lennon JT (2023) Evolution of a minimal cell. Nature 620:122–127. https://doi.org/10.1038/s41586-023-06288-x. (PMID: 10.1038/s41586-023-06288-x3740781310396959)
Noble WS (2006) What is a support vector machine? Nat Biotechnol 24:1565–1567. https://doi.org/10.1038/nbt1206-1565. (PMID: 10.1038/nbt1206-156517160063)
Novoa EM, Pavon-Eternod M, Pan T, Ribas de Pouplana L (2012) A role for tRNA modifications in genome structure and codon usage. Cell 149:202–213. https://doi.org/10.1016/j.cell.2012.01.050. (PMID: 10.1016/j.cell.2012.01.05022464330)
Ochman H, Moran NA (2001) Genes lost and genes found: evolution of bacterial pathogenesis and symbiosis. Science 292:1096–1099. https://doi.org/10.1126/science.1058543. (PMID: 10.1126/science.105854311352062)
Pál C, Papp B, Hurst LD (2003) Rate of evolution and gene dispensability. Nature 421:496–497. https://doi.org/10.1038/421496b. (PMID: 10.1038/421496b12556881)
Peng C-YJ, Lee KL, Ingersoll GM (2002) An introduction to logistic regression analysis and reporting. J Educ Res 96:3–14. https://doi.org/10.1080/00220670209598786. (PMID: 10.1080/00220670209598786)
Pérez A, Larrañaga P, Inza I (2006) Supervised classification with conditional Gaussian networks: increasing the structure complexity from naive Bayes. Int J Approximate Reasoning 43:1–25. https://doi.org/10.1016/j.ijar.2006.01.002. (PMID: 10.1016/j.ijar.2006.01.002)
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106. https://doi.org/10.1007/BF00116251. (PMID: 10.1007/BF00116251)
Rancati G, Moffat J, Typas A, Pavelka N (2018) Emerging and evolving concepts in gene essentiality. Nat Rev Genet 19:34–49. https://doi.org/10.1038/nrg.2017.74. (PMID: 10.1038/nrg.2017.7429033457)
Reis MD (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res 32:5036–5044. https://doi.org/10.1093/nar/gkh834. (PMID: 10.1093/nar/gkh83415448185521650)
Rocha EPC (2003) Gene essentiality determines chromosome organisation in bacteria. Nucleic Acids Res 31:6570–6577. https://doi.org/10.1093/nar/gkg859. (PMID: 10.1093/nar/gkg85914602916275555)
Rocha EPC (2004) The replication-related organization of bacterial genomes. Microbiology 150:1609–1627. https://doi.org/10.1099/mic.0.26974-0. (PMID: 10.1099/mic.0.26974-015184548)
Rocha EPC, Danchin A (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21:108–116. https://doi.org/10.1093/molbev/msh004. (PMID: 10.1093/molbev/msh00414595100)
Satapathy SS, Powdel BR, Dutta M, Buragohain AK, Ray SK (2014) Selection on GGU and CGU codons in the high expression genes in bacteria. J Mol Evol 78:13–23. https://doi.org/10.1007/s00239-013-9596-6. (PMID: 10.1007/s00239-013-9596-624271854)
Sen P, Kurmi A, Ray SK, Satapathy SS (2022) Machine learning approach identifies prominent codons from different degenerate groups influencing gene expression in bacteria. Genes Cells 27:591–601. https://doi.org/10.1111/gtc.12977. (PMID: 10.1111/gtc.1297735996802)
Sharp PM (2005) Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res 33:1141–1153. https://doi.org/10.1093/nar/gki242. (PMID: 10.1093/nar/gki24215728743549432)
Sharp PM, Li W-H (1986) An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol 24:28–38. https://doi.org/10.1007/BF02099948. (PMID: 10.1007/BF020999483104616)
Sharp PM, Li W-H (1987) The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucl Acids Res 15:1281–1295. https://doi.org/10.1093/nar/15.3.1281. (PMID: 10.1093/nar/15.3.12813547335340524)
Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM (2016) Extreme gradient boosting as a method for quantitative structure-activity relationships. J Chem Inf Model 56:2353–2360. https://doi.org/10.1021/acs.jcim.6b00591. (PMID: 10.1021/acs.jcim.6b0059127958738)
Sueoka N (1988) Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci USA 85:2653–2657. https://doi.org/10.1073/pnas.85.8.2653. (PMID: 10.1073/pnas.85.8.26533357886280056)
Sun S, Xiao J, Zhang H, Zhang Z (2016) Pangenome evidence for higher codon usage bias and stronger translational selection in core genes of Escherichia coli. Front Microbiol. https://doi.org/10.3389/fmicb.2016.01180. (PMID: 10.3389/fmicb.2016.01180281272935174107)
Xu L, Guo Z, Liu X (2020) Prediction of essential genes in prokaryote based on artificial neural network. Genes Genom 42:97–106. https://doi.org/10.1007/s13258-019-00884-w. (PMID: 10.1007/s13258-019-00884-w)
Zhang R (2004) DEG: a database of essential genes. Nucleic Acids Res 32:271D – 272. https://doi.org/10.1093/nar/gkh024. (PMID: 10.1093/nar/gkh024)
Zhong J, Wang J, Peng W, Zhang Z, Pan Y (2013) Prediction of essential proteins based on gene expression programming. BMC Genomics 14:S7. https://doi.org/10.1186/1471-2164-14-S4-S7. (PMID: 10.1186/1471-2164-14-S4-S7242670333856491)
Zhou T, Weems M, Wilke CO (2009) Translationally optimal codons associate with structurally sensitive sites in proteins. Mol Biol Evol 26:1571–1580. https://doi.org/10.1093/molbev/msp070. (PMID: 10.1093/molbev/msp070193496432734146)
فهرسة مساهمة: Keywords: Bacterial genome; Codon usage bias; Essential genes; Machine learning; Molecular evolution; Selection
المشرفين على المادة: 0 (Codon)
تواريخ الأحداث: Date Created: 20240726 Date Completed: 20240727 Latest Revision: 20240727
رمز التحديث: 20240729
DOI: 10.1007/s00438-024-02163-0
PMID: 39060647
قاعدة البيانات: MEDLINE
الوصف
تدمد:1617-4623
DOI:10.1007/s00438-024-02163-0