دورية أكاديمية

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

التفاصيل البيبلوغرافية
العنوان: HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
المؤلفون: Nurk S; Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA., Walenz BP; Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA., Rhie A; Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA., Vollger MR; Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA., Logsdon GA; Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA., Grothe R; Pacific Biosciences, Menlo Park, California 94025, USA., Miga KH; UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95064, USA., Eichler EE; Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.; Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA., Phillippy AM; Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA., Koren S; Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20894, USA.
المصدر: Genome research [Genome Res] 2020 Sep; Vol. 30 (9), pp. 1291-1305. Date of Electronic Publication: 2020 Aug 14.
نوع المنشور: Evaluation Study; Journal Article; Research Support, N.I.H., Extramural; Research Support, N.I.H., Intramural
اللغة: English
بيانات الدورية: Publisher: Cold Spring Harbor Laboratory Press Country of Publication: United States NLM ID: 9518021 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1549-5469 (Electronic) Linking ISSN: 10889051 NLM ISO Abbreviation: Genome Res Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Cold Spring Harbor, N.Y. : Cold Spring Harbor Laboratory Press, c1995-
مواضيع طبية MeSH: Genetic Variation*, High-Throughput Nucleotide Sequencing/*methods , Sequence Analysis, DNA/*methods, Alleles ; Animals ; Cell Line ; Chromosome Duplication ; DNA, Neoplasm ; DNA, Satellite ; Drosophila/genetics ; Genome, Human ; Haplotypes ; Humans ; Reproducibility of Results ; Software
مستخلص: Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.
(© 2020 Nurk et al.; Published by Cold Spring Harbor Laboratory Press.)
References: J Mol Evol. 1987;25(3):207-14. (PMID: 2822935)
Nat Biotechnol. 2015 Jun;33(6):623-30. (PMID: 26006009)
Genome Biol. 2004;5(2):R12. (PMID: 14759262)
Genome Biol. 2020 Sep 14;21(1):245. (PMID: 32928274)
Nat Biotechnol. 2019 May;37(5):540-546. (PMID: 30936562)
Genome Res. 2008 May;18(5):821-9. (PMID: 18349386)
Genomics. 1999 Mar 15;56(3):274-87. (PMID: 10087194)
Methods Mol Biol. 2010;673:1-17. (PMID: 20835789)
Front Immunol. 2012 Oct 08;3:294. (PMID: 23060878)
Bioinformatics. 2011 Nov 1;27(21):2964-71. (PMID: 21926123)
Science. 2016 Apr 1;352(6281):aae0344. (PMID: 27034376)
Nature. 2012 Nov 1;491(7422):56-65. (PMID: 23128226)
Nat Commun. 2021 Apr 28;12(1):1935. (PMID: 33911078)
Bioinformatics. 2017 Jul 15;33(14):2202-2204. (PMID: 28369201)
Nat Biotechnol. 2020 Nov;38(11):1347-1355. (PMID: 32541955)
Science. 2001 Oct 5;294(5540):109-15. (PMID: 11588252)
Genome Res. 2016 Nov;26(11):1453-1467. (PMID: 27803192)
Genome Biol. 2013;14(9):R101. (PMID: 24034426)
Proc Natl Acad Sci U S A. 2009 Jan 20;106(3):853-8. (PMID: 19131514)
Genomics. 1996 Dec 15;38(3):325-30. (PMID: 8975709)
Bioinformatics. 2008 Dec 15;24(24):2818-24. (PMID: 18952627)
Nature. 2004 Aug 19;430(7002):857-64. (PMID: 15318213)
Nat Commun. 2020 Sep 22;11(1):4794. (PMID: 32963235)
Science. 2002 Aug 9;297(5583):1003-7. (PMID: 12169732)
Bioinformatics. 2019 Nov 1;35(21):4394-4396. (PMID: 30942877)
Genome Biol. 2019 Aug 26;20(1):174. (PMID: 31451112)
Bioinformatics. 2016 Nov 1;32(21):3321-3323. (PMID: 27378299)
Cytogenet Cell Genet. 1988;47(3):144-8. (PMID: 2837365)
Science. 2018 Jun 8;360(6393):. (PMID: 29880660)
Nucleus. 2017 Jul 4;8(4):331-339. (PMID: 28406740)
Bioinformatics. 2016 Jul 15;32(14):2103-10. (PMID: 27153593)
Nature. 2020 Sep;585(7823):79-84. (PMID: 32663838)
Science. 2002 Oct 4;298(5591):129-49. (PMID: 12364791)
Nat Methods. 2019 Jan;16(1):88-94. (PMID: 30559433)
Ann Hum Genet. 2020 Mar;84(2):125-140. (PMID: 31711268)
BMC Genomics. 2010 Mar 23;11:195. (PMID: 20331851)
Nat Biotechnol. 2019 Oct;37(10):1155-1162. (PMID: 31406327)
Nucleic Acids Res. 1991 Mar 25;19(6):1179-82. (PMID: 2030938)
Genome Res. 2017 May;27(5):722-736. (PMID: 28298431)
Nature. 2016 Oct 13;538(7624):243-247. (PMID: 27706134)
Nat Biotechnol. 2018 Oct 22;:. (PMID: 30346939)
Genome Res. 2017 May;27(5):849-864. (PMID: 28396521)
Nat Biotechnol. 2020 Sep;38(9):1044-1053. (PMID: 32686750)
Genome Res. 1998 Mar;8(3):186-94. (PMID: 9521922)
Genome Res. 2010 Feb;20(2):265-72. (PMID: 20019144)
Mol Syst Biol. 2005;1:2005.0030. (PMID: 16729065)
Bioinformatics. 2018 Sep 1;34(17):i748-i756. (PMID: 30423094)
Curr Opin Microbiol. 2015 Feb;23:110-20. (PMID: 25461581)
Proc Natl Acad Sci U S A. 2001 Aug 14;98(17):9748-53. (PMID: 11504945)
Genome Res. 2001 Jun;11(6):1005-17. (PMID: 11381028)
Nat Biotechnol. 2021 Mar;39(3):309-312. (PMID: 33288905)
Genome Res. 2015 Mar;25(3):445-58. (PMID: 25589440)
Nat Methods. 2020 Feb;17(2):155-158. (PMID: 31819265)
Bioinformatics. 2013 Apr 15;29(8):1072-5. (PMID: 23422339)
Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8. (PMID: 21187386)
Nat Biotechnol. 2011 Jan;29(1):24-6. (PMID: 21221095)
Nat Biotechnol. 2019 May;37(5):561-566. (PMID: 30936564)
Bioinformatics. 2020 May 1;36(9):2896-2898. (PMID: 31971576)
Nat Biotechnol. 2018 Apr;36(4):338-345. (PMID: 29431738)
Sci Data. 2016 Jun 07;3:160025. (PMID: 27271295)
Nat Commun. 2019 Apr 16;10(1):1784. (PMID: 30992455)
Chromosome Res. 2018 Sep;26(3):115-138. (PMID: 29974361)
Nat Methods. 2016 Dec;13(12):1050-1054. (PMID: 27749838)
Genomics. 1989 Nov;5(4):822-8. (PMID: 2591965)
Bioinformatics. 2009 Aug 15;25(16):2078-9. (PMID: 19505943)
Nat Methods. 2013 Jun;10(6):563-9. (PMID: 23644548)
Bioinformatics. 2004 Oct 12;20(15):2421-8. (PMID: 15087315)
Bioinformatics. 2018 Sep 15;34(18):3094-3100. (PMID: 29750242)
Genomics. 2010 Jun;95(6):315-27. (PMID: 20211242)
Nat Genet. 2017 Apr;49(4):643-650. (PMID: 28263316)
Mol Biol Evol. 2018 Mar 1;35(3):543-548. (PMID: 29220515)
Genome Res. 2005 Aug;15(8):1127-35. (PMID: 16077012)
Genome Biol. 2019 Nov 5;20(1):232. (PMID: 31690338)
Bioinformatics. 2010 Mar 15;26(6):841-2. (PMID: 20110278)
معلومات مُعتمدة: F32 GM134558 United States GM NIGMS NIH HHS; R01 HG010169 United States HG NHGRI NIH HHS; R21 HG010548 United States HG NHGRI NIH HHS; U01 HG010971 United States HG NHGRI NIH HHS; R01 HG002385 United States HG NHGRI NIH HHS
المشرفين على المادة: 0 (DNA, Neoplasm)
0 (DNA, Satellite)
تواريخ الأحداث: Date Created: 20200818 Date Completed: 20211029 Latest Revision: 20240329
رمز التحديث: 20240329
مُعرف محوري في PubMed: PMC7545148
DOI: 10.1101/gr.263566.120
PMID: 32801147
قاعدة البيانات: MEDLINE
الوصف
تدمد:1549-5469
DOI:10.1101/gr.263566.120