Comment by Retr0id

1 year ago

You can get more total IO throughput (at the cost of latency) by queueing up multiple reads and writes concurrently. You can do this with threads, but io_uring should theoretically go faster (but don't take my word for it, let's wait for benchmarks).

I'm personally interested in the potential for async bindings for Python. Making fast async wrappers for blocking APIs in Python-land is painful (although it might improve in the future with nogil).

3 comments

Retr0id

jmull 1 year ago

They had been talking about making the high-level interface to sqlite async (sqlite3_step()).

With io_uring you're talking about the low-level, where blocks are actually read and written.

As-is, sqlite is agnostic on that point. It doesn't do I/O directly, but uses an OS abstraction layer, called VFS. VFS implementations for common platforms are built-in, but you can create your own that handles storage IO any way you like, including queuing reads and writes concurrently using io_uring.

So that's not a reason to rewrite sqlite.

(In fact, I'd be surprised if they weren't looking at io_uring, and, if it seemed likely to generally improve performance, to provide an option to use it, either in the existing linux-vfs or in some other way.)

> I'm personally interested in the potential for async bindings for Python.

Well, it's perfectly possible to do that with the current sqlite. It may be painful, as you say, but not even remotely at the level of pain a complete rewrite entails.

Retr0id 1 year ago
The VFS interface is synchronous, I don't see how a custom VFS could meaningfully implement asynchronous IO.
> Well, it's perfectly possible to do that with the current sqlite.
If you want to wrap a blocking API in python, with actual parallelism, you have to use multiple processes with communication between them. The main advantage of sqlite in the first place is that it's in-process, and you'd lose that.
- jmull 1 year ago
  
  > The VFS interface is synchronous
  On a single thread. There can be multiple threads.
  Of course leaving a thread idle while waiting for IO isn't great. That's why I noted it at the beginning. But it doesn't seem idling threads has proven to be much of a problem with sqlite, so it wouldn't be much justification for a rewrite.
  > If you want to wrap a blocking API in python, with actual parallelism, you have to use multiple processes
  You can use multiple threads in the same process.
  (Python has some limitations in that respect, but that's not a sqlite issue and can't be fixed by a sqlite rewrite.)