← Back to context

Comment by bob1029

5 years ago

Storing your blobs outside of the database when they are an inherent part of some larger transactional model is the giant rookie mistake IMO. We did this for a little while for the reasons of "oh my gosh look how big the .db is getting" and it ended up a complete shitshow. Simple tasks like cleaning up old entries can quickly turn into a nightmare if you have to go around disk and make sure you are not leaving garbage behind. There are also other concerns outside the box like VM snapshots. This could leave you in a very inconsistent state regarding what blob you have on disk vs what the database is expecting.

Is having everything in one gigantic .db file an actual downside? What makes a database's size "unmanageable"? Presumably, you'd have to store that business information somewhere anyways, right? I don't understand how 1 unified file is unmanageable, but splitting that concern up into 2 different domains magically makes it better.

It can trash db performance. A relatively simple alternative is to store references to immutable blobs in the db so that everything remains transactionally consistent (and easily restorable to any point-in-time). This is what many large companies do. I happen to work at a company that runs the largest + most "shall not be named" databases and has struggled for years to evict all the blobs people shoved in em.

  • If you index on the file_name in a table where you have file_name and a blob then why would it trash performance? SQLite can quickly locate where to start reading the blob.