Comment by yoz

5 years ago

As everyone else has been saying, this is amazing work. It sounds like the biggest issue with loading the page is the initial sql.js download - it's about 1.2MB, is that right?

Might it be feasible to easily strip down SQLite so that it only compiles the parts for read-only use? The browser version is obviously somewhat read-only but that's because of the sandbox. I'm talking about excluding the code for CREATE, UPDATE, INSERT and everything else which is just for writing. The aim here would be to produce a significantly smaller WASM binary.

I'm guessing that the answer is no, there's no easy way of doing this without significant rewrites of SQLite's core, but... I can't be the only one to think of this, surely?

The actual transferred data for the sqlite code should only be 550kB (gzip compression).

Stripping out the write parts is a good idea. SQLite actually has a set of compile time flags to omit features [1]. I just tried enabling as many of those as possible, but it didn't seem to reduce wasm size much, though I might be doing something wrong. There's also no easy flags to disable CREATE / UPDATE / INSERT .

[1] https://www.sqlite.org/compile.html#omitfeatures

I'd be curious whether there's any changes which could be made in the file format to optimize for read-only usage. The SQLite format probably has some features which aren't needed in this context -- information about free pages and autoincrement counters isn't relevant in a read-only file, for instance.

  • I think it wouldn't change much - SQLite is already pretty optimized towards reads, for example a write always replaces a whole page and locks the whole DB. The free pages can easily be removed by doing VACUUM beforehand which should be done anyways to balance the b-trees.

    The storage of SQLite is already really efficient, for example integers are always stored as varints so small ones only take a byte. The only thing I think could maybe be improved for this use case is changing the structure of the b-tree to be more columnar - since right now all the data from different columns is intermingled with the btree structure itself, querying a subset of columns has a high overhead.

Sounds feasible to me. Either by replacing all those functions on the C side with empty shells or maybe even with wasm-opt ( but probably the OP has already used it removed all garbage collectible paths.)