Improving Imputation Quality in BEAGLE for Crop and Livestock Data

التفاصيل البيبلوغرافية
العنوان: Improving Imputation Quality in BEAGLE for Crop and Livestock Data
المؤلفون: David Cavero, Steffen Weigend, Henner Simianer, Manfred Mayer, Johannes Geibel, Torsten Pook, Chris Carolin Schoen
المصدر: G3: Genes, Genomes, Genetics, Vol 10, Iss 1, Pp 177-188 (2020)
G3: Genes|Genomes|Genetics
بيانات النشر: Oxford University Press (OUP), 2020.
سنة النشر: 2020
مصطلحات موضوعية: Crops, Agricultural, reference panel, 0106 biological sciences, Livestock, Word error rate, imputation, Investigations, QH426-470, Biology, 01 natural sciences, Beagle, 03 medical and health sciences, Effective population size, Statistics, Genetics, Animals, Preprocessor, Hidden Markov model, Molecular Biology, reference genome, Genetics (clinical), 030304 developmental biology, 2. Zero hunger, 0303 health sciences, Genetic diversity, beagle, Reference Standards, Human genetics, Haplotypes, Genetic structure, Software, Imputation (genetics), Genome-Wide Association Study, 010606 plant biology & botany, Reference genome
الوصف: Imputation is one of the key steps in the preprocessing and quality control protocol of any genetic study. Most imputation algorithms were originally developed for the use in human genetics and thus are optimized for a high level of genetic diversity. Different versions of BEAGLE were evaluated on genetic datasets of doubled haploids of two European maize landraces, a commercial breeding line and a diversity panel in chicken, respectively, with different levels of genetic diversity and structure which can be taken into account in BEAGLE by parameter tuning. Especially for phasing BEAGLE 5.0 outperformed the newest version (5.1) which in turn also lead to improved imputation. Earlier versions were far more dependent on the adaption of parameters in all our tests. For all versions, the parameter ne (effective population size) had a major effect on the error rate for imputation of ungenotyped markers, reducing error rates by up to 98.5%. Further improvement was obtained by tuning of the parameters affecting the structure of the haplotype cluster that is used to initialize the underlying Hidden Markov Model of BEAGLE. The number of markers with extremely high error rates for the maize datasets were more than halved by the use of a flint reference genome (F7, PE0075 etc.) instead of the commonly used B73. On average, error rates for imputation of ungenotyped markers were reduced by 8.5% by excluding genetically distant individuals from the reference panel for the chicken diversity panel. To optimize imputation accuracy one has to find a balance between representing as much of the genetic diversity as possible while avoiding the introduction of noise by including genetically distant individuals.
تدمد: 2160-1836
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::51065081e4cc8420e4886f27ed6aecfb
https://doi.org/10.1534/g3.119.400798
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....51065081e4cc8420e4886f27ed6aecfb
قاعدة البيانات: OpenAIRE