Comment by hcarvalhoalves

13 years ago

Sounds interesting, but theoretical. Any practical applications? I'm guessing it has some purpose on dealing with long string sequences, like genome sequencing?

From what I understand, it is useful for storing compressed data and querying it without decompression.

Range queries allow for full-text search, for example.

Something like that + data compression. If you have some data type mapped to a sequence of symbols, WT can speed up search in it. Please also note it is not a complete full text search index (like ones used in genome sequencing). It is a rank/select dictionary.

  • I believe he's doing full text search on this by running Burrows-Wheeler on the data first. See here for more details:

       http://alexbowe.com/fm-index/
    

    I started reimplementing all of this in Go about a year ago, but moved onto other stuff before I finished.

    • Sure, that is what I meant. Some additional structures (BWT, ...) are required for a complete FTS. WT allows only 1-symbol searches that might be enough in practical applications (databases?).

      I also have an implementation of dynamic LOUDS-based multiary WT (it is much faster on large alphabets), but it haven't been fully finished yet.