Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly

التفاصيل البيبلوغرافية
العنوان: Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly
المؤلفون: Karyn Meltz Steinberg, Laura Clarke, Hsiu-Chuan Chen, Derek Albracht, Kerstin Howe, Robert S. Fulton, Matthew Boitano, Jonathan Wood, Milinn Kremitzki, Adam M. Phillippy, William Chow, Terence Murphy, Paul Kitts, Deanna M. Church, Sergey Koren, Chen-Shan Chin, Sean McGrath, Sarah Pelan, Tim Hubbard, Nathan Bouk, Glen Threadgold, Kate Auger, Kim D. Pruitt, Heng Li, Richard K. Wilson, Glenn Harden, Vincent Magrini, Tina A. Graves-Lindsay, Françoise Thibaud-Nissen, Chris Markovic, Joanna Collins, Valerie A. Schneider, Jared T. Simpson, Paul Flicek, James Torrance, Richard Durbin
بيانات النشر: Cold Spring Harbor Laboratory, 2016.
سنة النشر: 2016
مصطلحات موضوعية: Genetics, 0303 health sciences, Word error rate, Genomics, Computational biology, Biology, Base (topology), 03 medical and health sciences, Annotation, 0302 clinical medicine, Path (graph theory), 030217 neurology & neurosurgery, 030304 developmental biology, Reference genome, Coding (social sciences), Sequence (medicine)
الوصف: The human reference genome assembly plays a central role in nearly all aspects of today’s basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009 and reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that while the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1b04f6124e0254f15d6336b49183d540
https://doi.org/10.1101/072116
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....1b04f6124e0254f15d6336b49183d540
قاعدة البيانات: OpenAIRE