← Back to context

Comment by jltsiren

7 months ago

Succinct data structures such as wavelet trees are widely used in bioinformatics. There you often have strings that cannot be tokenized or parsed meaningfully, so you just have to deal with sequences of symbols. And because the strings are often very long and/or there can be a huge number of them, the data structures have to be space-efficient.

A wavelet tree is best seen as an intermediate data structure. It doesn't do anything particularly interesting on its own, but it can be used as a building block for higher-level data structures. For example, you can create an FM-index by storing the Burrows–Wheeler transform in a wavelet tree. (Though there are better options when the alphabet is small.) And then you can use the FM-index to find exact matches of any length between the pattern and the indexed strings.

People working with succinct data structures often talk about bitvectors rather than bitmaps. The difference is that bitmaps tend to focus on set operations, while bitvectors are more about random access with rank, select, and related queries. Then you could see wavelet trees as a generalization of bitvectors from a binary alphabet to larger alphabets. And then you have people talking about wavelet trees, when they really mean a wider class of conceptually and functionally similar data structures, regardless of the actual implementation.