Limbo: A complete rewrite of SQLite in Rust

1 year ago (turso.tech)

Given the code quality and rigid testing, SQLite is probably the last project that should be rewritten. It'd be great to see all other C code rewritten first!

  • That was my take when LibSQL was announced. And it still is and would be my take if LibSQL remains C-coded. But a Rust-coded rewrite of SQLite3 or LibSQL is a different story.

    The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary, and they don't accept contributions to any of either. This incentivizes anyone who needs support and/or new features in SQLite3 to join the SQLite Consortium. It's a great business model -- I love it. But there are many users who want more of a say than even being a consortium member would grant them, and they want to contribute. For those users only a fork would make sense. But a fork would never gain much traction given that test suite being proprietary, and the SQLite3 team being so awesome.

    However, a memory-safe language re-implementation of SQLite3 is a very different story. The U.S. government wants everyone to abandon C/C++ -- how will they do this if they depend on SQLite3? Apart from that there's also just a general interest and need to use memory-safe languages.

    That said, you're right that there are many other projects that call for a rewrite in Rust way before SQLite3. The thing is: if you have the need and the funding, why wouldn't you rewrite the things you need first? And if SQLite3 is the first thing you need rewritten, why not?

    • >> The SQLite3 business model is that SQLite3 is open source

      This is going to sound pedantic, but SQLite is not Open Source. It's Public Domain. The distinction is subtle, but it is important.

      46 replies →

    • Bugs are fixed along with regression tests. Here's a recent example: https://www.sqlite.org/src/info/289daf6cee39625e

      As far as I can see, these tests come with the same public domain dedication as the rest of the code.

      You may be referring to the TH3 tests (https://sqlite.org/th3.html). The main goal (100% branch coverage, 100% MC/DC) would not be achievable for a Rust implementation (or at least an idiomatic Rust implementation …) because of the remaining dynamic run-time checks Rust requires for safety.

      1 reply →

    • > The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary

      no.

      the business model is services, and a red phone to companies who use sqlite in production. like nokia back in the days when we had these little flip phones, or desk phones had a "rolodesk" built in, or many other embedded uses of a little lovely dependable data store.

      the services include porting to and "certification" on specifically requested hardware and OS combinations, with indeed proprietary test suites. now these are not owned by sqlite, but by third parties. which license them to sqlite (the company).

      and it started with being paid by the likes of nokia or IBM to make sqlite production ready, add mc/dc coverage, implement fuzzing, etc etc etc.,

      their license asks you to do good not evil. and they take that serious and try their best to do the same. their own stuff is to an extreme extend in the public domain.

      2 replies →

    • > The U.S. government wants everyone to abandon C/C++

      That's the position of two federal agencies, namely, FBI and CISA. They don't describe how this change will reduce CVEs or why the languages they prefer still produce projects with CVEs.

      I don't particularly hold the technical or social acumen of FBI or CISA in particularly high regard and I'm not sure why anyone would by default either. Mostly because they say things like "switch to python!" without once accounting for the fact that python is written in C.

      It's an absurd point to invoke as a defense of this idea.

      1 reply →

    • Why does the fork have to gain traction?

      You keep and maintain your local fork that does what you need it to do. perhaps if you are charitable you share it with others. but you don't need to do this. and it just adds support burden.

      1 reply →

    • > The U.S. government wants everyone to abandon C/C++ -- how will they do this if they depend on SQLite3?

      ABI, the same way you don't need the Linux kernel to be rewritten to remove your app dependency on C/C++

    • Just stumbled onto this forum. Really, appreciated such a thoughtful and insightful comment. Nice corner of the internet you have here.

      1 reply →

  • > Given the code quality and rigid testing, SQLite is probably the last project that should be rewritten.

    That was my take for many years but I have come around 180 degree on this. I think at this point it's very likely and most likely mandatory to eventually rewrite SQLite. In parts because of what is called out in the blog post: the tests are not public. More importantly, the entire project is not really open. And to be clear: that is okay. The folks that are building it, want to have it that way, and that's the contract we have as users.

    But that does make certain things really tricky that are quite exciting. So yes, I do think that SQLite could need some competition. Even just for finding new ways to influence the original project.

    • This reminds me of VIM - and after quite some time I believe that all VIM users will agree that adding NeoVIM to the ecosystem improved VIM itself. VIM 8 addressed over half the issues that led to the NeoVIM fork in the first place - with the exception of the issue of user contributions, of course.

  • A company that works with SQLite and prefers to write Rust has the expertise needed to rewrite SQLite in Rust. That’s what they’re doing.

    All the other C code could be rewritten, this doesn’t stop or slow down any such effort. But for sure it was never going to be possible for a database provider to start making a memory safe implementation of libpng or something.

  • Seems like a potentially interesting project to get rid of sqlite's compatibility baggage e.g. non-strict tables, opt-in foreign keys, the oddities around rowid tables, etc... as well as progress the dialect a bit (types and domains for instance).

    • But the article mentions that they intend to have full compatibility:

        > Our goal is to build a reimplementation of SQLite from scratch, fully compatible at the language and file format level, with the same or higher reliability SQLite is known for, but with full memory safety and on a new, modern architecture.

      5 replies →

  • As a counterpoint, doing a rewrite of an example of the best C codebases gives you a much more interesting comparison between the languages. Rewriting a crappy C codebase in a modern, memory safe language is virtually guaranteed to result in something better. If a carefully executed rewrite of SQLite in Rust doesn't produce improvements (or some difficult tradeoffs), that's very informative about the relative virtues of C and Rust.

  • Code quality is not the only thing to consider. Some people would love to see something like SQLite with 2 important changes: referential integrity that respects the DDL and strict tables that also respects the DDL.

    • I might be missing something—is there a reason why rewriting it in Rust would be a prerequisite to adding these features, vs just starting a fork?

      And in this case the project intends to be fully compatible, so they wouldn't be able to unilaterally adopt or drop features anyway.

      6 replies →

  • I agree on a level that SQLIte is a master class in testing and quality. However, considering how widely used it is (essentially every client application on the planet) and that it does get several memory safety CVEs every year there is some merit in a rewrite in a memory safe language.

  • While I agree with you on one level, that code rigidity and testing means that a port of SQLite is much more viable than most other C-based projects. And I'm intrigued by what this would enable, e.g. the WASM stuff the authors mention. It's not that it couldn't be done in C but it'll be easier for a wider range of contributors to do it in Rust.

When the initial SQLite3->LibSQL fork was announced I was pretty negative about it because SQLite3 has a wonderful, 100% branch coverage test suite that is proprietary, and so without access to that any fork would be bound to fail.

However, if there's a big product behind the fork, and even better, a rewrite to a memory-safe language, then the fork begins to make a lot of sense. So, hats off to y'all for pulling this off!

  • Good luck for sure, but browsing their compatibility matrix, it looks like they are a LONG way off. By the looks of it, they have mostly read compatibility with little write capabilities (no alter table, for example).

    • That's fully in line with what they're announcing here. It's the announcement of a new project that has passed the prototyping stage, but one that has not reached the 1.0 stage.

      5 replies →

All this talk of “SQLite is not open contribution” never seems to consider that a project being “open contribution” doesn't mean the maintainers will accept your contributions.

They have a process for contributions to follow: you suggest a feature, they implement it. It's far from the only project to take such a stance.

Just in the SQLite “ecosystem” see the contribution policies of Litestream and LiteFS. I don't see people brandishing the ”not open contribution” to Ben's projects.

https://github.com/superfly/litefs?tab=readme-ov-file#contri...

https://github.com/benbjohnson/litestream?tab=readme-ov-file...

> SQLite’s test suite is proprietary

This is literally the first time I've ever heard of this, for any project anywhere. I suppose Android is built a bit in this way, but that's a whole other can of worms.

  • They have a test suite that is part of SQLite3 then public domain product, and they have a much bigger and better test suite that is proprietary.

  • they do not fully own said proprietary sql test suite. they've licensed it. that's why they can _run_ it but not publish it or share it. That's at least how I remember Richard Hick describing the situation at a talk.

  • It could be simply to prevent forks, but if it really is 100% branch coverage, why do they still have memory safety related CVE coming out? With asan turned on, and full static analysis, that should make such errors exceedingly rare. Part of the benefit of rust is that it makes coverage both easier to get due to its type system, and less necessary because of the guarantees it makes. But if they really went all the way to 100% branch coverage that should be almost as good if all the samitizers are running.

  • Large chunks of the test suite are open source, committed to the repo and easy to run with a `make test`.

    Everytime a bug is reported in the forums, the open source tests are updated as part of the bug fix for everyone to see.

    There's a separate test suite that offers 100% coverage, that is proprietary, and which was created for certification for use in safety critical environments.

    HN loves to discuss business models for open source, but apparently has a problem with this one. Why?

You can have a memory safe SQLite today if you compile it with Fil-C. Only tiny changes required. I almost have it passing the test suite (only two test failures left, both of which look specious).

  • Did a little reading on Fil-C and… “Also, it's slow – about 1.5x-5x slower than legacy C.”

    So that’s dead on arrival.

Clearly still in very early days:

    uv run --with pylimbo --python 3.13 python

Then:

    >>> import limbo
    >>> con = limbo.connect("/tmp/content.db")
    thread '<unnamed>' panicked at core/schema.rs:186:18:
    not yet implemented: Expected CREATE TABLE statement
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

With that environment variable:

    stack backtrace:
    0: _rust_begin_unwind
    1: core::panicking::panic_fmt
    2: limbo_core::util::parse_schema_rows
    3: _limbo::__pyfunction_connect
    4: pyo3::impl_::trampoline::trampoline

  • Hey Simon! However early you think it is, I can guarantee it is even earlier =)

    If this is just a standard sqlite database that you are trying to open, though, I'd have expected it to work.

I'm not buying the rationale in the "async IO" section.

First, there's no need to rewrite anything to add an async interface to sqlite if you want (many clients do, whether local or remote).

The issue with sqlite's synchronous interface is leaving a thread idle while you wait for IO. But I wonder how much of an issue that really is. sqlite is designed to run very locally to the storage, and can make use of native file caching, etc, which makes IO blocking very short if not zero. You wonder if applications have enough idling sqlite threads to justify the switching. (It's not free and would be at quite a fine-grained level.)

The section does mention remote storage, but in that case you're much better off with an async client talking to compute running sqlite, sync interface and all, that is very local to the storage. AKA, a client/server database.

Also, in the WASM section, we're still talking about something that would best be implemented as a sqlite client/wrapper, with no need at all to rewrite it.

  • > The issue with sqlite's synchronous interface is leaving a thread idle while you wait for IO

    That's not the only issue. waiting for the result of every read to be able to queue the next read is also an issue, particularily for a VFS that exists on a network (which is a target of theirs, they explicitly mention S3).

    I'm not sure if they also are doing work on improving this, but I'm sure that theoretically many reads and writes that SQLite does do not depend on all previous reads and writes, which means you could queue many of them earlier. If your latency to storage is large, this can be a huge performance difference.

  • You can get more total IO throughput (at the cost of latency) by queueing up multiple reads and writes concurrently. You can do this with threads, but io_uring should theoretically go faster (but don't take my word for it, let's wait for benchmarks).

    I'm personally interested in the potential for async bindings for Python. Making fast async wrappers for blocking APIs in Python-land is painful (although it might improve in the future with nogil).

    • They had been talking about making the high-level interface to sqlite async (sqlite3_step()).

      With io_uring you're talking about the low-level, where blocks are actually read and written.

      As-is, sqlite is agnostic on that point. It doesn't do I/O directly, but uses an OS abstraction layer, called VFS. VFS implementations for common platforms are built-in, but you can create your own that handles storage IO any way you like, including queuing reads and writes concurrently using io_uring.

      So that's not a reason to rewrite sqlite.

      (In fact, I'd be surprised if they weren't looking at io_uring, and, if it seemed likely to generally improve performance, to provide an option to use it, either in the existing linux-vfs or in some other way.)

      > I'm personally interested in the potential for async bindings for Python.

      Well, it's perfectly possible to do that with the current sqlite. It may be painful, as you say, but not even remotely at the level of pain a complete rewrite entails.

      2 replies →

The license is "Copyright 2024 the Limbo authors". How is that possible if Limbo is based on a rewrite?

Do they claim a clean room implementation?

It seems wise of SQLite to close down their test suite. That's a great idea I wish I had heard about earlier.

  • SQLite is in the public domain. It is perfectly legal to create a derivative works from a public domain project and license it however you want. It's not cool and kind of a dick move to put it under a more restrictive license, but it's legal.

    • It's not a dick move if you are making legitimate improvements -- especially if you still reference the origin. That's literally the idea behind public domain

    • However that only really works for people who are satisfied with SQLite's public domain licensing. If you are in a jurisdiction that doesn't allow you to dedicate a work to the public domain and are worried about the SQLite developers suing you for infringement at some point, Limbo holds the exact same risk of SQLite suing you.

  • SQLite is in the public domain (i.e. not copyrighted), so a clean room implementation is unnecessary.

So many good things with incremental improvements in the space, but as a consumer it kinda stresses me out having to worry about libsql vs sqlite vs duckdb etc.

I personally use SQLite and DuckDB daily, but recently adopted turso in lieu of litestream for a something. I appreciate that they all are relatively compatible but I'd love to just have a tool.

Even then thats why I love the relationship between SQLite and DuckDB. I can backend my system with SQLite and run analytics and processing via DuckDB and they service specific purposes.

The hard thing with this for me is being a split consumer and not having the bandwidth to split my attention between who is doing better innovation and just using a tool I can rely on to predictably get the job done for me.

That being said, hats off this is awesome. I really appreciate turso.

I am assuming that DO-178B certification for the Rust variant is not on the table.

https://www.sqlite.org/hirely.html

https://www.sqlite.org/qmplan.html

https://www.sqlite.org/th3.html

The name "Limbo" is also used by a post-C/UNIX language from AT&T for the Inferno operating system.

https://en.wikipedia.org/wiki/Limbo_(programming_language)

> To complete the puzzle, we wanted to deterministically test the behavior of the database when interacting with the operating system and other components. To do that, we are partnering with Antithesis

Are there any open source DST projects, even just getting started? I don't even know how/where to start if I would want to do the same on a small app, but can't afford nor want to depend long term on a commercial license.

A side topic: is there a nice big extensive free test suite for sql, for people interested in making toy databases to use?

OT: for just a second I thought it was a rewrite of the Limbo programming language. Might be a fun side project! :)

Are there any plans for the python bindings to support an async interface?

  • That would be a really cool feature! I've been running sqlite3 in async python for Datasette for six years now but it involves some pretty convoluted threading mechanisms, having native async would be fantastic.

I would love to see it succeed!

They mention testing that bytecode generation generates the exact same results as SQLite... Does this exclude writing new optimization passes that are not in sqlite?

One killer feature I miss from SQLite is table compression. Especially important on various embedded devices where you collect data from sensors or logs.

sqlite3 is 1.6MB while limbo is 6MB, size matters for many low-end but huge-volume embedded boards.

disclosure: I work here. I am happy to answer any questions

tl;dr We are rewriting SQLite in Rust. It uses Asynchronous I/O, considers WASM as first class, and has Deterministic Simulation Testing support from the beginning.

source: https://github.com/tursodatabase/limbo

I am not sure how feasible it is, but can't SQLite be partially rewritten step by step on the main branch instead of being forked?

As the article mentioned, a complete rewrite will not be as stable as the original.

  • SQLite is open-source but not open-contribution – they don't accept contributions of that sort. They follow "cathedral" style development and invented a whole alternative to git for that purpose https://fossil-scm.org/home/doc/43c3d95a/www/fossil-v-git.wi...

    https://www.sqlite.org/copyright.html

    > In order to keep SQLite completely free and unencumbered by copyright, the project does not accept patches. If you would like to suggest a change and you include a patch as a proof-of-concept, that would be great. However, please do not be offended if we rewrite your patch from scratch.

  • As other commenters have already pointed out, SQLite does not take outside contributions.

    We already have a fork, called libSQL. However, the goals of Limbo are far more ambitious and we cannot rewrite some parts step by step. We want to have DST ( Deterministic Simulation Testing), a testing methodology pioneered by Foundation DB and TigerBeetle. It is not easy to do that in an existing codebase

  • > can't SQLite be partially rewritten step by step on the main branch

    Only by the SQLite team. They don't accept contributions of anything other than spelling fixes and such.

  • what do you mean with "on the main branch"? i doubt that migrating from c makes sense for their constraints and expertise, you would not want someone to come into your house and change your furniture. forking is the right political and technical approach for this team. also rust does not support a lot of sqlite target platforms

Limbo has been taken (as the name of a language), so this should be SQuaLor or something...

  • this is a codename, and if the project is to be successful, we don't expect to keep it.

    • Nothing is as permanent as something temporary.

      Congrats on a great new undertaking!

Is there any big open soure, long term, community contributed, in rust?

  • Edit: Is there any big open source project, long term, community contributed, in rust?

As others have said, the “Performance” section is asinine, because they haven't fully implemented 100% of SQLite. Not disclaiming this obvious fact in the “Performance” section is incredibly misleading.

I could trivially write “an SQLite clone” that could execute `SELECT * FROM users LIMIT 1` even faster than either this or SQLite—if that's the only string I accepted as input!

Is there any software that let me make graphical user interface to a connected database, allows me to make data visualizations, all things automatic and interactive?

Like a node editor or spreadsheet? It needs to be suitable for general public

Dunno. Good luck to them, but I never saw a need to rewrite sqlite.

  • Guessing the shortcomings become starker if you’re spending lots of time in the codebase/building a company on top of it.

    • Yeah… Attempting to integrate MVCC and then doing vector search gave enough perspective to do this!

    • > building a company on top of it.

      So be sure you proceed in such a way that never contributes any money or code back to the original project.

  • I do see a need for multiple implementations of SQLite3. First there's the need for multiple implementations for the reasons given by the LibSQL folks, second there's the need for a memory-safe language implementation of SQLite3, and third there's the need for a native language implementation for languages whose runtimes really want not to have C involved (e.g., Go).

    • The fact that there's no alternative implementation of SQLite also seems to play a part in preventing standardization of WebSQL.

      https://www.w3.org/TR/webdatabase/

      "The specification reached an impasse: all interested implementors have used the same SQL backend (Sqlite), but we need multiple independent implementations to proceed along a standardisation path."

      5 replies →

I’ll take the faster c version anyday over the rust. How are those conditional if statements working for you all?

  • The benchmarks in the post indicate that Limbo is more performant than SQLite, not less.

    • > Executing cargo bench on Limbo’s main directory, we can compare SQLite running SELECT * FROM users LIMIT 1 (620ns on my Macbook Air M2), with Limbo executing the same query (506ns), which is 20% faster.

      Faster on a single query, returning a single result, on a single computer. That's not how database performance should be measured or compared.

      In any case, the programming language should have little to no impact on the database performance, since the majority of the time is spent waiting on io anyway

      8 replies →

    • This is often true on the journey to reach feature-parity with an original codebase.

      The reason is obvious, of course: it has less features, and doing nothing is always faster than doing something.

      Once it's feature complete, then meaningful comparisons can be made. For now, it's puffery.

    • be careful with that, though. In a lot of ways it is still slower.

      The goal with that was just to demonstrate that there's nothing really there that is fundamentally slower, and the perf is already on par in the areas where we spent cycles on.

    • Microbenchmarks are not particularly predictive of performance with real workloads. And there's just one microbenchmark claimed here.