← Back to context

Comment by Gehinnn

5 years ago

TL;DR: Compile SQLite to JS with emscripten, implement a virtual fs in JS to stream chunks of a statically hosted (readonly) SQL database.

If queries make use of indices, only a fraction of the database needs to be downloaded.

Also, you can use SQLite to query the DOM.

I can't figure out exactly how it knows which chunk to download. Does it always download the whole index first? Or does it include it in the built JS file itself?

  • Both the index and table data are btrees. These are trees - the root node sits in some known location (offset) in the file, referenced by the file header and metadata. As SQLite traverses the tree, it encounters new descendents it would like to visit, presumably identified by their byte offset in the file, which is all needed for this VFS magic to issue a suitable range request.

    - SQlite opens the file and reads 4kb worth of header -> range request for byte 0-4096

    - headers/metadata refers to index table with root node at 8192kb

    - user issues SELECT * from index WHERE name = 'foo'

    - SQLite reads root node from the file (range request for 8192kb..)

    - Root node indicates left branch covers 'foo'. Left branch node at address 12345kb

    - Fetch left branch (range request for 12345kb)

    - New node contains an index entry for 'foo', row 55 of data page at 919191kb

    - SQLite reads data page (range request for 91919191kb..)

    etc etc etc

    • Thanks, I too was struggling to understand how it's able to do such efficient targeted range requests, you explained it nicely.

  • SQLite has runtime-pluggable VFS support, i.e. you give it a struct with functions for opening a file, reading some bytes, writing some bytes, synchronizing file contents, closing a file. This project provides such a VFS module, that, because it actually runs in the browser, performs HTTP requests to read data. Emscripten provides a way to run a mix of C/C++ code in the same environment as some JavaScript code inside the browser. The reason SQLite has this pluggable VFS support is to properly support embedded systems, different locking APIs, and things like database encryption.

    https://www.sqlite.org/vfs.html

  • That's part of SQLite. It has been optimised to reduce disk reads, because those can be slow on spinning hard drives. Coincidentally, this translates well into an optimised algorithm that minimises the amount of HTTP range requests to make.

  • B-Tree indexes are designed to work like this, to require a low number of IO operations. The index contains pointers to other places in the index.