دورية أكاديمية

ESKEMAP: exact sketch-based read mapping

التفاصيل البيبلوغرافية
العنوان: ESKEMAP: exact sketch-based read mapping
المؤلفون: Tizian Schulz, Paul Medvedev
المصدر: Algorithms for Molecular Biology, Vol 19, Iss 1, Pp 1-14 (2024)
بيانات النشر: BMC, 2024.
سنة النشر: 2024
المجموعة: LCC:Biology (General)
LCC:Genetics
مصطلحات موضوعية: Sequence sketching, Long-read mapping, Exact algorithm, Dynamic programming, Biology (General), QH301-705.5, Genetics, QH426-470
الوصف: Abstract Background Given a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a “similar sequence”. Traditionally, “similar sequence” was defined as having a high alignment score and read mappers were viewed as heuristic solutions to this well-defined problem. For sketch-based mappers, however, there has not been a problem formulation to capture what problem an exact sketch-based mapping algorithm should solve. Moreover, there is no sketch-based method that can find all possible mapping positions for a read above a certain score threshold. Results In this paper, we formulate the problem of read mapping at the level of sequence sketches. We give an exact dynamic programming algorithm that finds all hits above a given similarity threshold. It runs in $$\mathcal {O} (|t| + |p| + \ell ^2)$$ O ( | t | + | p | + ℓ 2 ) time and $$\mathcal {O} (\ell \log \ell )$$ O ( ℓ log ℓ ) space, where |t| is the number of $$k$$ k -mers inside the sketch of the reference, |p| is the number of $$k$$ k -mers inside the read’s sketch and $$\ell$$ ℓ is the number of times that $$k$$ k -mers from the pattern sketch occur in the sketch of the text. We evaluate our algorithm’s performance in mapping long reads to the T2T assembly of human chromosome Y, where ampliconic regions make it desirable to find all good mapping positions. For an equivalent level of precision as minimap2, the recall of our algorithm is 0.88, compared to only 0.76 of minimap2.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1748-7188
Relation: https://doaj.org/toc/1748-7188
DOI: 10.1186/s13015-024-00261-7
URL الوصول: https://doaj.org/article/89c9298a0ec2426a9abe382955ff80f2
رقم الأكسشن: edsdoj.89c9298a0ec2426a9abe382955ff80f2
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:17487188
DOI:10.1186/s13015-024-00261-7