دورية أكاديمية

Population inference based on mitochondrial DNA control region data by the nearest neighbors algorithm.

التفاصيل البيبلوغرافية
العنوان: Population inference based on mitochondrial DNA control region data by the nearest neighbors algorithm.
المؤلفون: Yang FC; Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan., Tseng B; Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan., Lin CY; Institute of Forensic Medicine, Ministry of Justice, New Taipei City, 23016, Taiwan., Yu YJ; Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan., Linacre A; College of Science & Engineering, Flinders University, Adelaide, 5001, Australia., Lee JC; Department of Forensic Medicine, College of Medicine, National Taiwan University, No.1 Jen-Ai Road Section 1, Taipei, 10051, Taiwan. jimlee@ntu.edu.tw.
المصدر: International journal of legal medicine [Int J Legal Med] 2021 Jul; Vol. 135 (4), pp. 1191-1199. Date of Electronic Publication: 2021 Feb 14.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Springer International Country of Publication: Germany NLM ID: 9101456 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1437-1596 (Electronic) Linking ISSN: 09379827 NLM ISO Abbreviation: Int J Legal Med Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Heidelberg, FRG : Springer International, c1990-
مواضيع طبية MeSH: Algorithms* , Locus Control Region* , Phylogeny*, DNA, Mitochondrial/*genetics , Genetics, Population/*methods , Racial Groups/*genetics, Humans ; Indigenous Peoples/genetics ; Taiwan/ethnology
مستخلص: Population and geographic assignment are frequently undertaken using DNA sequences on the mitochondrial genome. Assignment to broad continental populations is common, although finer resolution to subpopulations can be less accurate due to shared genetic ancestry at a local level and members of different ancestral subpopulations cohabiting the same geographic area. This study reports on the accuracy of population and subpopulation assignment by using the sequence data obtained from the 3070 mitochondrial genomes and applying the K-nearest neighbors (KNN) algorithm. These data also included training samples used for continental and population assignment comprised of 1105 Europeans (including Austria, France, Germany, Spain, and England and Caucasian countries), 374 Africans (including North and East Africa and non-specific area (Pan-Africa)), and 1591 Asians (including Japan, Philippines, and Taiwan). Subpopulations included in this study were 1153 mitochondrial DNA (mtDNA) control region sequences from 12 subpopulations in Taiwan (including Han, Hakka, Ami, Atayal, Bunun, Paiwan, Puyuma, Rukai, Saisiyat, Tsou, Tao, and Pingpu). Additionally, control region sequence data from a further 50 samples, obtained from the Sigma Company, were included after they were amplified and sequenced. These additional 50 samples acted as the "testing samples" to verify the accuracy of the population. In this study, based on genetic distances as genetic metric, we used the KNN algorithm and the K-weighted-nearest neighbors (KWNN) algorithm weighted by genetic distance to classify individuals into continental populations, and subpopulations within the same continent. Accuracy results of ethnic inferences at the level of continental populations and of subpopulations among KNN and KWNN algorithms were obtained. The training sample set achieved an overall accuracy of 99 to 82% for assignment to their continental populations with K values from 1 to 101. Population assignment for subpopulations with K assignments from 1 to 5 reached an accuracy of 77 to 54%. Four out of 12 Taiwanese populations returned an accuracy of assignment of over 60%, Ami (66%), Atayal (67%), Saisiyat (66%), and Tao (80%). For the testing sample set, results of ethnic prediction for continental populations with recommended K values as 5, 10, and 35, based on results of the training sample set, achieved overall an accuracy of 100 to 94%. This study provided an accurate method in population assignment for not only continental populations but also subpopulations, which can be useful in forensic and anthropological studies.
References: Kivisild T (2015) Maternal ancestry and population history from whole mitochondrial genomes. Investig Genet 6:3. https://doi.org/10.1186/s13323-015-0022-2. (PMID: 10.1186/s13323-015-0022-2257982164367903)
Irwin JA, Saunier JL, Strouss KM, Sturk KA, Diegoli TM, Just RS, Coble MD, Parson W, Parsons TJ (2007) Development and expansion of high-quality control region databases to improve forensic mtDNA evidence interpretation. Forensic Sci Int Genet 1(2):154–157. https://doi.org/10.1016/j.fsigen.2007.01.019. (PMID: 10.1016/j.fsigen.2007.01.01919083747)
Budowle B, Allard MW, Wilson MR, Chakraborty R (2003) Forensics and mitochondrial DNA: applications, debates, and foundations. Annu Rev Genomics Hum Genet 4:119–141. https://doi.org/10.1146/annurev.genom.4.070802.110352. (PMID: 10.1146/annurev.genom.4.070802.11035214527299)
Pakendorf B, Stoneking M (2005) Mitochondrial DNA and human evolution. Annu Rev Genomics Hum Genet 6:165–183. (PMID: 10.1146/annurev.genom.6.080604.162249)
Lee C, Măndoiu II, Nelson CE (2010) Inferring ethnicity from mitochondrial DNA sequence. BMC Proc 5:S11. https://doi.org/10.1186/1753-6561-5-S2-S11. (PMID: 10.1186/1753-6561-5-S2-S11)
Egeland T, Bøvelstad HM, Storvik GO, Salas A (2004) Inferring the most likely geographical origin of mtDNA sequence profiles. Ann Hum Genet 68(5):461–471. https://doi.org/10.1046/j.1529-8817.2004.00109.x. (PMID: 10.1046/j.1529-8817.2004.00109.x15469423)
Torroni A, Schurr TG, Cabell MF, Brown MD, Neel JV, Larsen M, Smith DG, Vullo CM, Wallace DC (1993) Asian affinities and continental radiation of the four founding Native American mtDNAs. Am J Hum Genet 53(3):563–590. (PMID: 76889321682412)
Chen YS, Torroni A, Excoffier L, Santachiara-Benerecetti AS, Wallace DC (1995) Analysis of mtDNA variation in African populations reveals the most ancient of all human continent-specific haplogroups. Am J Hum Genet 57(1):133–149. (PMID: 76112821801234)
Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L, Scozzari R, Obinu D, Savontaus ML, Wallace DC (1996) Classification of european mtDNAs from an analysis of three European populations. Genetics 144(4):1835–1850. (PMID: 10.1093/genetics/144.4.1835)
Umetsu K, Yuasa I (2005) Recent progress in mitochondrial DNA analysis. Legal Med 7(4):259–262. https://doi.org/10.1016/j.legalmed.2005.01.005. (PMID: 10.1016/j.legalmed.2005.01.00515939655)
Emery LS, Magnaye KM, Bigham AW, Akey JM, Bamshad MJ (2015) Estimates of continental ancestry vary widely among individuals with the same mtDNA haplogroup. Am J Hum Genet 96(2):183–193. https://doi.org/10.1016/j.ajhg.2014.12.015. (PMID: 10.1016/j.ajhg.2014.12.015256202064320259)
Yamamoto K, Sakaue S, Matsuda K, Murakami Y, Kamatani Y, Ozono K, Momozawa Y, Okada Y (2020) Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population. Commun Biol 3(1):1–11. https://doi.org/10.1038/s42003-020-0812-9. (PMID: 10.1038/s42003-020-0812-9)
Lee JC, Ph D, Tsai L et al (2011) The distribution of mitochondrial D-loop sequence variations in Taiwan populations. FSJ 10(1):29–38.
Kimura M (2020) The neutral theory and molecular evolution. In: My Thoughts on Biological Evolution. Evolutionary Studies. Springer, Singapore. https://doi.org/10.1007/978-981-15-6165-8_8. (PMID: 10.1007/978-981-15-6165-8_8)
Nei M (1972) Genetic distance between populations. Am Nat 106:283–292. (PMID: 10.1086/282771)
Nei M (1978) The theory of genetic distance and evolution of human races. Jap J Human Genet 23:341–369. https://doi.org/10.1007/BF01908190. (PMID: 10.1007/BF01908190)
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4):406–425. https://doi.org/10.1093/oxfordjournals.molbev.a040454. (PMID: 10.1093/oxfordjournals.molbev.a040454)
Tajima A, Sun CS, Pan IH, Ishida T, Saitou N, Horai S (2003) Mitochondrial DNA polymorphisms in nine aboriginal groups of Taiwan: Implications for the population history of aboriginal Taiwanese. Hum Genet 113(1):24–33. https://doi.org/10.1007/s00439-003-0945-1. (PMID: 10.1007/s00439-003-0945-112687351)
Libbrecht M, Noble W (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16:321–332. https://doi.org/10.1038/nrg3920. (PMID: 10.1038/nrg3920259482445204302)
Schrider DR, Kern AD (2018) Supervised machine learning for population genetics: a new paradigm. Trends Genet 34(4):301–312. https://doi.org/10.1016/j.tig.2017.12.005. (PMID: 10.1016/j.tig.2017.12.005293314905905713)
Suguna N, Thanushkodi K (2010) An improved k-nearest neighbor classification using genetic algorithm. IJCSI 7(4):814–1694.
Duda RO, Hart PE, Stork DG (2000) Pattern Classification, 2nd edn. Wiley-Interscience, New York.
van Oven M, Kayser M (2009) Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum Mutat 30(2):E386–E394. http://www.phylotree.org . https://doi.org/10.1002/humu.20921 . Accessed 30 Jan 2019.
Ko AMS, Chen CY, Fu Q, Delfin F, Li M, Chiu HL, Stoneking M, Ko YC (2014) Early austronesians: into and out of Taiwan. Am J Hum Genet 94(3):426–436. https://doi.org/10.1016/j.ajhg.2014.02.003. (PMID: 10.1016/j.ajhg.2014.02.003246073873951936)
Bilal E, Rabadan R, Alexe G, Fuku N, Ueno H, Nishigaki Y, Fujita Y, Ito M, Arai Y, Hirose N, Ruckenstein A, Bhanot G, Tanaka M (2008) Mitochondrial DNA haplogroup D4a is a marker for extreme longevity in Japan. PLoS One 3(6):e2421. https://doi.org/10.1371/journal.pone.0002421. (PMID: 10.1371/journal.pone.0002421185457002408726)
Delfin F, Min-Shan Ko A, Li M, Gunnarsdóttir ED, Tabbada KA, Salvador JM, Calacal GC, Sagum MS, Datar FA, Padilla SG, de Ungria MCA, Stoneking M (2014) Complete mtDNA genomes of Filipino ethnolinguistic groups: a melting pot of recent and ancient lineages in the Asia-Pacific region. Eur J Hum Genet 22(2):228–237. https://doi.org/10.1038/ejhg.2013.122. (PMID: 10.1038/ejhg.2013.12223756438)
Behar DM, Harmant C, Manry J, van Oven M, Haak W, Martinez-Cruz B, Salaberria J, Oyharçabal B, Bauduer F, Comas D, Quintana-Murci L, Genographic Consortium (2012) The Basque paradigm: genetic evidence of a maternal continuity in the Franco-Cantabrian region since pre-neolithic times. Am J Hum Genet 90(3):486–493. https://doi.org/10.1016/j.ajhg.2012.01.002. (PMID: 10.1016/j.ajhg.2012.01.002223651513309182)
Coble MD, Just RS, O’Callaghan JE, Letmanyi IH, Peterson CT, Irwin JA, Parsons TJ (2004) Single nucleotide polymorphisms over the entire mtDNA genome that increase the power of forensic testing in Caucasians. Int J Legal Med 118(3):137–146. https://doi.org/10.1007/s00414-004-0427-6. (PMID: 10.1007/s00414-004-0427-614760490)
Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L, Metspalu E, Scozzari R, Makkan H, Tzur S, Comas D, Bertranpetit J, Quintana-Murci L, Tyler-Smith C, Wells RS, Rosset S, Genographic Consortium (2008) The dawn of human matrilineal diversity. Am J Hum Genet 82(5):1130–1140. https://doi.org/10.1016/j.ajhg.2008.04.002. (PMID: 10.1016/j.ajhg.2008.04.002184395492427203)
Soares P, Alshamali F, Pereira JB, Fernandes V, Silva NM, Afonso C, Costa MD, Musilova E, Macaulay V, Richards MB, Cerny V, Pereira L (2012) The expansion of mtDNA haplogroup L3 within and out of Africa. Mol Biol Evol 29(3):915–927. https://doi.org/10.1093/molbev/msr245. (PMID: 10.1093/molbev/msr24522096215)
Yang IS, Lee HY, Yang WI, Shin KJ (2013) mtDNAprofiler: a Web application for the nomenclature and comparison of human mitochondrial DNA sequences. J Forensic Sci 58:972–980. https://doi.org/10.1111/1556-4029.12139. (PMID: 10.1111/1556-4029.1213923682804)
Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113. https://doi.org/10.1186/1471-2105-5-113. (PMID: 10.1186/1471-2105-5-11315318951517706)
Felsenstein J (1989) PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 5:164–166.
Polychronopoulos V (2012) Appying machine learning methods to a mitochondrial DNA dataset to guess ethnicity. https://users.soe.ucsc.edu/~vassilis/projects/CMPS242_ProjectReport_Polychronopoulos.pdf Accessed 1 September 2020.
Qu Y, Tran D, Ma W (2019) Deep learning approach to biogeographical ancestry inference. Procedia Comput Sci 159:552–561. https://doi.org/10.1016/j.procs.2019.09.210. (PMID: 10.1016/j.procs.2019.09.210)
معلومات مُعتمدة: MOST 107-2320-B-002-045-MY3 Ministry of Science and Technology, Taiwan
فهرسة مساهمة: Keywords: Control region; Nearest neighbors algorithm; Population inference; mtDNA
المشرفين على المادة: 0 (DNA, Mitochondrial)
تواريخ الأحداث: Date Created: 20210215 Date Completed: 20210827 Latest Revision: 20211204
رمز التحديث: 20221213
DOI: 10.1007/s00414-021-02520-3
PMID: 33586030
قاعدة البيانات: MEDLINE
الوصف
تدمد:1437-1596
DOI:10.1007/s00414-021-02520-3