Space-Efficient String Indexing for Wildcard Pattern Matching

Moshe Lewenstein, Yakov Nekrich, Jeffrey Scott Vitter

In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses $O(n\log^{\varepsilon}n)$ bits for any $\varepsilon>0$ and reports all $\mathrm{occ}$ occurrences of a wildcard string in $O(m+\sigma^g \cdot\mu(n) + \mathrm{occ})$ time, where $\mu(n)=o(\log\log\log n)$, $\sigma$ is the alphabet size, $m$ is the number of alphabet symbols and $g$ is the number of wildcard symbols in the query string. We also present an $O(n)$-bit index with $O((m+\sigma^g+\mathrm{occ})\log^{\varepsilon}n)$ query time and an $O(n(\log\log n)^2)$-bit index with $O((m+\sigma^g+\mathrm{occ})\log\log n)$ query time. These are the first non-trivial data structures for this problem that need $o(n\log n)$ bits of space.

