Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage

التفاصيل البيبلوغرافية
العنوان: Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage
المؤلفون: Thomas K. F. Wong, Louis Ranjard, Allen G. Rodrigo
المصدر: BMC Bioinformatics
BMC Bioinformatics, Vol 20, Iss 1, Pp 1-12 (2019)
بيانات النشر: Springer Science and Business Media LLC, 2019.
سنة النشر: 2019
مصطلحات موضوعية: Computer science, Assembly, lcsh:Computer applications to medicine. Medical informatics, Biochemistry, Genome, DNA sequencing, Machine Learning, 03 medical and health sciences, chemistry.chemical_compound, 0302 clinical medicine, Structural Biology, Animals, Mitochondrion, lcsh:QH301-705.5, Molecular Biology, 030304 developmental biology, Sequence (medicine), Macropodidae, 0303 health sciences, Base Sequence, Nucleotides, Applied Mathematics, High-Throughput Nucleotide Sequencing, Correction, Western-grey kangaroo, Amplicon, Computer Science Applications, lcsh:Biology (General), chemistry, Genome, Mitochondrial, Amplicon sequencing, Key (cryptography), lcsh:R858-859.7, DNA microarray, Algorithm, Algorithms, 030217 neurology & neurosurgery, DNA, Research Article, Reference genome
الوصف: Background In short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of uncovered gaps. Reference guided assembly can then improve these assemblies. However, when the available reference is phylogenetically distant from the sequencing reads, the mapping rate of the reads can be extremely low. Some recent improvements in read mapping approaches aim at modifying the reference according to the reads dynamically. Such approaches can significantly improve the alignment rate of the reads onto distant references but the processing of insertions and deletions remains challenging. Results Here, we introduce a new algorithm to update the reference sequence according to previously aligned reads. Substitutions, insertions and deletions are performed in the reference sequence dynamically. We evaluate this approach to assemble a western-grey kangaroo mitochondrial amplicon. Our results show that more reads can be aligned and that this method produces assemblies of length comparable to the truth while limiting error rate when classic approaches fail to recover the correct length. Finally, we discuss how the core algorithm of this method could be improved and combined with other approaches to analyse larger genomic sequences. Conclusions We introduced an algorithm to perform dynamic alignment of reads on a distant reference. We showed that such approach can improve the reconstruction of an amplicon compared to classically used bioinformatic pipelines. Although not portable to genomic scale in the current form, we suggested several improvements to be investigated to make this method more flexible and allow dynamic alignment to be used for large genome assemblies.
تدمد: 1471-2105
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b5910e0bad231585bc4075f01ccf2182
https://doi.org/10.1186/s12859-019-3287-2
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....b5910e0bad231585bc4075f01ccf2182
قاعدة البيانات: OpenAIRE