Approximate Circular Pattern Matching

التفاصيل البيبلوغرافية
العنوان: Approximate Circular Pattern Matching
المؤلفون: Charalampopoulos, Panagiotis, Kociumaka, Tomasz, Radoszewski, Jakub, Pissis, Solon P., Rytter, Wojciech, Waleń, Tomasz, Zuba, Wiktor
سنة النشر: 2022
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Data Structures and Algorithms
الوصف: We consider approximate circular pattern matching (CPM, in short) under the Hamming and edit distance, in which we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a threshold $k>0$, and we are to report all starting positions of fragments of $T$ (called occurrences) that are at distance at most $k$ from some cyclic rotation of $P$. In the decision version of the problem, we are to check if any such occurrence exists. All previous results for approximate CPM were either average-case upper bounds or heuristics, except for the work of Charalampopoulos et al. [CKP$^+$, JCSS'21], who considered only the Hamming distance. For the reporting version of the approximate CPM problem, under the Hamming distance we improve upon the main algorithm of [CKP$^+$, JCSS'21] from ${\cal O}(n+(n/m)\cdot k^4)$ to ${\cal O}(n+(n/m)\cdot k^3\log\log k)$ time; for the edit distance, we give an ${\cal O}(nk^2)$-time algorithm. We also consider the decision version of the approximate CPM problem. Under the Hamming distance, we obtain an ${\cal O}(n+(n/m)\cdot k^2\log k/\log\log k)$-time algorithm, which nearly matches the algorithm by Chan et al. [CGKKP, STOC'20] for the standard counterpart of the problem. Under the edit distance, the ${\cal O}(nk\log^3 k)$ runtime of our algorithm nearly matches the ${\cal O}(nk)$ runtime of the Landau-Vishkin algorithm [LV, J. Algorithms'89]. As a stepping stone, we propose ${\cal O}(nk\log^3 k)$-time algorithm for the Longest Prefix $k'$-Approximate Match problem, proposed by Landau et al. [LMS, SICOMP'98], for all $k'\in \{1,\dots,k\}$. We give a conditional lower bound that suggests a polynomial separation between approximate CPM under the Hamming distance over the binary alphabet and its non-circular counterpart. We also show that a strongly subquadratic-time algorithm for the decision version of approximate CPM under edit distance would refute SETH.
Comment: Accepted to ESA 2022. Abstract abridged to meet arXiv requirements
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2208.08915
رقم الأكسشن: edsarx.2208.08915
قاعدة البيانات: arXiv