Comment by formerly_proven

2 years ago

> I've used SQLite blob fields for storing files extensively. Note that there is a 2GB blob maximum: https://www.sqlite.org/limits.html

Also note that SQLite does have an incremental blob I/O API (sqlite3_blob_xxx), so unlike most other RDBMS there is no need to read/write blobs as a contiguous piece of memory - handling large blobs is more reasonable than in those. Though the blob API is still separate from normal querying.

Do you have or know of a clear example of how to do this? I have to ask because I spent half of yesterday trying to make it work. The blob_open command wouldn't work until I set a default value on the blob column and then the blob_write command wouldn't work because you can't resize a blob. It was very weird but I'm pretty confident it's because I'm missing something stupid.

  • I’m pretty sure you can’t resize blobs using this API. The intended usage is to insert/update a row using bind_zeroblob and then update that in place (sans journaling) using the incremental API. It’s a major limitation especially for writing compressed data.

    • Great. I had not seen bind_zeroblob when reading yesterday. Maybe that’s what I needed to get it moving. Thanks.

Most RDBMS's have streaming blob apis:

MS SQL Server: READTEXT, WRITETEXT, substring, UPDATE.WRITE

Oracle: DBMS_LOB.READ, DBMS_LOB.WRITE

PG: Large Objects

Most of my experience is with SQL server and it can stream large objects incrementally through a web app to browser without loading the whole thing into memory at 100's Mbytes/sec on normal hardware.

  • I wasn’t aware of PG Large Objects, though I do know about MSSQL Filestream, which iirc are actual files on disk and remotely transparently accessed via SMB - which in this context would be the moral equivalent of storing just the path in a SQLite db. The functionality you quote seems to be deprecated?

    • SQL server has image, text, varbinary(max) and varchar(max) none of those except varbinary(max) are filestream enabled, and varbinary is only file stream if that is setup and the column is specified file stream when created otherwise its in the DB like the others.

      image and text with READTEXT and WRITETEXT are deprecated but still work fine, varbinary(max) and substring UPDATE.WRITE are the modern equivalent and use the same implementations underneath.

      Filestream allows larger than 2 gigs, stores the blob data in the filesystem but otherwise is accessed like the other blobs along with some special capability like getting a SMB file pointer from the client for direct access. It is also backed up and replicated like normal, definitely not just like storing a path in the DB.

      Filestream performs worse the in db for blobs under about 100kb I believe where it would be recommended to keep them in db for max perf.

      I have used MSSQL blobs long before filestream existed and it works well except for the 2 gig limit and once the db gets large backup and log management get more unwieldy than if you just stored them outside the db but it does keep them transactional consistent, which filestream also does.

I wish the API was compatible with iovec though. As that's what all the c standard lib APIs use for non-contiguous memory

That sounds nice for chunking, but what if you need contiguous memory? E.g. viewing an image or running an AI model

  • You can always use a chunked API to read data into a contiguous buffer. Just allocate a large enough contiguous block of memory, then copy chunk by chunk into that contiguous memory until you've read everything.