Longest Common Extensions with Wildcards: Trade-off and Applications

التفاصيل البيبلوغرافية
العنوان: Longest Common Extensions with Wildcards: Trade-off and Applications
المؤلفون: Bathie, Gabriel, Charalampopoulos, Panagiotis, Starikovskaya, Tatiana
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Data Structures and Algorithms
الوصف: We study the Longest Common Extension (LCE) problem in a string containing wildcards. Wildcards (also called "don't cares" or "holes") are special characters that match any other character in the alphabet, similar to the character "?" in Unix commands or "." in regular expression engines. We consider the problem parametrized by $G$, the number of maximal contiguous groups of wildcards in the input string. Our main contribution is a simple data structure for this problem that can be built in $O(n (G/t) \log n)$ time, occupies $O(nG/t)$ space, and answers queries in $O(t)$ time, for any $t \in [1 .. G]$. Up to the $O(\log n)$ factor, this interpolates smoothly between the data structure of Crochemore et al. [JDA 2015], which has $O(nG)$ preprocessing time and space, and $O(1)$ query time, and a simple solution based on the ``kangaroo jumping'' technique [Landau and Vishkin, STOC 1986], which has $O(n)$ preprocessing time and space, and $O(G)$ query time. By establishing a connection between this problem and Boolean matrix multiplication, we show that our solution is optimal up to subpolynomial factors when $G = \Omega(n)$ under a widely believed hypothesis. In addition, we develop a new simple, deterministic and combinatorial algorithm for sparse Boolean matrix multiplication. Finally, we show that our data structure can be used to obtain efficient algorithms for approximate pattern matching and structural analysis of strings with wildcards.
Comment: Accepted to ESA 2024
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2408.03610
رقم الأكسشن: edsarx.2408.03610
قاعدة البيانات: arXiv