Comment by TheRealPomax
5 years ago
The great thing about applications that use sqlite as their application file format is that you can write secondary apps and utilities to supplement the applications themselves.
For example, Adobe's Lightroom uses sqlite as their application file format, which means that it's almost trivial to write an application to help it do things it's either really bad/exceedingly slow at (like removing images from a catalogue that no longer exist on disk) or literally can't do (like generating a playlist of files-on-disk for a specific collection or tag), without being locked into whatever janky scripting solution exists in-app. If there even is one to begin with.
All you need is a programming language with a sqlite connector, and you're in the driving seat. And sure, you'll need to figure out the table schemas you need to care about, but sqlite comes with sqldiff so it's really easy to figure out which operations hit which tables with only a few minutes of work.
Good luck reverse engineering proprietary file formats in the same way!
I have an application that uses sqlite as their file format but they've cleared the header so you can't open it directly.
I'd love to be able to make secondary applications like you've described but being enterprise software they don't want to make it too easy.
They obviously want to keep people locked in with their $40k per seat application!
I guess the first step is figuring out the page size and other bits the other meta data you set in the header [1].
I know I just have to sit down and understand the format better and I will eventually figure it out...
[1] https://www.sqlite.org/fileformat.html
That's mean, but you'll just have to live with that. If someone takes active steps towards preventing people from making secondary apps/utils, then they're (a) jerks, and (b) allowed to do that.
Might be worth actually contacting them to ask why, if you can make the case that secondary applications will increase the value of their app, not decrease it.
I'm surprised that nobody's pointed out that there are actually valid reasons other than greed to obscure your file format. It's an implementation detail, not a contract. If customers begin relying on the implementation details, you end up with angry customers when you change the implementation details. A SQLite db without the header is basically a statement saying, "we are using the obvious file format here for our convenience, not for general purpose access. Screw around in here at your own risk."
If you modified their app's internal state db and screwed it up because they have designed their software with certain assumptions that aren't clear from just reading their db schema, that would be a nightmare for them to support. The easiest thing for them to do is just to try to discourage tampering with their internal state.
This is especially true if there's a chance that a market for secondary apps/utils will spring up. If that's to happen and be viable, they absolutely would want to put thought into what their supported interfaces are for those apps/utils, otherwise they will end up painted into a corner and unable to change their architecture without destroying a marketplace.
6 replies →
>That's mean, but you'll just have to live with that
I mean I know we're all on board with the idea of intellectual property actually being a thing now, but surely there are limits? I've seen people take the hard-line stance that if something is your property you should be able to dictate exactly under what situation it can be used, but there have to be limits to IP holders rights on some level, and I feel like reverse engineering a file format is a pretty reasonable place to draw that line.
23 replies →
I can understand why they do it, it's enterprise software so the more open you make it, then the less licences they can sell.
They do have a module you can purchase to run API calls and access their files/software but as you probably guessed that's another $40k license!
Most of my apps I build use this API, but for me to provide to other companies they need them to also buy the API extension.
I'd love to cut out the middle man and I'll do it eventually when I reverse engineer the header!
2 replies →
At $40K/seat, that must be some special software. Simple economics invites competition. Depending on the software, it may not take too many seats to incent the funding of an alternative.
Maybe this is the kind of software that requires huge development costs. But maybe it would be worth 20 seats' worth of customers joining forces to fund a team of 5 people to build you a competing app tailored to your specific needs/wants and completely under your control.
Granted, that could bump your costs from $800K/year to $1.6M+/year. But only short-term. Once your software is production-ready, you drop the costs of your current software. So think of it more like going from $8M/10 years to $6-10M/10 years but having complete control to add the features you want. And perhaps having the opportunity to recoup $millions/year by licensing to others. Or, open source it and give others the same kind of control while benefiting from the features they add. Spread your development costs across more seats to further lower your $/seat.
Or, look at the 100 employees your vendor currently has and lose heart, then hope somebody with deep pockets funds a competitor.
This particular software is for water utilities to model and simulate their water and waste water networks.
It's mostly used so utilites can forecast growth in their areas for the next 25+ years and see the impact on their networks and feed into their capital work projects.
A decently sized utility may spend up to $200M/yr on capital works so $40k isn't even a line item!
There is completion in the market but consultants are forced to use what their clients pick and most utilites aren't that price sensitive.
There are also open source alternatives by the EPA[1][2], and most commercial operators are just wrappers around this public domain software.
I'm trying to create FOSS to help view and run these models.
[1] https://en.m.wikipedia.org/wiki/EPANET
[2] https://en.m.wikipedia.org/wiki/Storm_Water_Management_Model
3 replies →
shrug I used to own a company (since sold) that sold software for $50k/seat. Most people who bought it also bought the automatic failover option, which was another $50k/seat.
The movie business doesn't even blink at that sort of cost if it there's even a small chance to prevent having to set up the remote shot again. The logistics, time, hiring, transport, accommodation, equipment, wages, etc. etc. etc. all make $50k a drop in the ocean.
We spent 2 years writing the software, developing the add-on hardware that helped, and touting it around various Post-production houses. It was used on Star Wars I, The Matrix, etc. Post houses started to take it on board as well. Then we were bought, and the product discontinued. C'est la vie.
1 reply →
I might be wrong but I thought Dropbox used to do that as well. All the sync state is stored in a sqlite file with the header changed or removed.
Completely agree - I wrote about some fun I had with Apple Photos last month: https://simonwillison.net/2020/May/21/dogsheep-photos/
Enjoyed this post a lot and had no idea apple use SQLite u see the hood for a lot of stuff - thanks for sharing!
Most Mac apps are backed by a SQLite database.
It opens up so many possibilities. Nice write-up!
Heck, I wrote https://github.com/Pomax/lightroom-catalog-helper exactly because lrcat is an open format that takes barely any time to figure out, which means the only limit to what you can do with "lightroom" is what your programming skills allow for.
The only thing stopping SQLite from becoming the primary application file format is browsers. Every single other client supports SQLite. Embedded, Mobile, Lightroom, Desktop Apps and Cross platform mobile frameworks have really good support for SQLite.
On the one hand, browsers dislike access to files (regardless of format) for security reasons. On the other hand, browsers use SQLite a lot. For example Firefox uses SQLite at least for IndexedDB (various files) and for history and bookmarks (places.sqlite): https://developer.mozilla.org/en-US/docs/Mozilla/Firefox_Ope...
Wasn't there an effort a few years back to try to create a W3C standard for accessing data that stalled because all of the browsers used SQLite and no one wanted to create a different implementation? IIRC, the standard required two different implementations and SQLite was used by everyone?
2 replies →
Primary application file format for... what kind of files? Because it's certainly not going to (nor should it ever) replace hypertext documents.
Being able to load a read-only .sqlite database might seem cool, but I can't think of a single instance in which that's smaller and/or more efficient than calling data endpoints that use gzip/brotli for transport.
Or are you thinking "as replacement for IndexedDB"? In which case, hard agree but then it's an actual database, not used as file data container.
> sqldiff
You just blew my mind. Thank you!
I can't believe the things I find out from random HN comments.
I tried to do that with a Firefox SQLite database files but it had an exclusive lock.
This is the case for any application that constantly writes to their dbs, so you usually run your own application/utility on them when the "parent" application isn't running.
You can usually work around that by creating a copy - I run "cp /path/to/locked.db /tmp" all the time for this.
Make sure to also copy locked.db-wal if it's present. (Is there a better way to do all this? I feel like there really ought to be a better way.)
Is it any different from JSON or XML?
that's like asking whether .zip is any different from .txt - yes, it's rather quite a lot different. A databases--that-is-a-file-on-disk, used as application file format for containing arbitrary -and arbitrarily complex- data is almost nothing like a plain text markup document.
Databases, XML and JSON are structured formats. They contain embedded metadata (tag names/keys/table&column names) so reverse-engineering is much easier.
I don't really see any difference or advantage of using SQLite over XML, unless you need full RDBMS engine power for your configuration (highly unlikely).
2 replies →
It's very annoying when the applications use sqlite with encryption. That is not open source, so regular language bindings can not be used.
I'm not sure how slqite as your application file format has anything to do with open source, honestly. It's certainly super easy for open source projects to use, but a closed source proprietary application using an encrypted sqlite file sounds perfectly sensible to me?
We all win when folks decide to leave it accessible, but I'm not going to hold "encrypting a file format so that people can't easily reverse engineer it" against folks who are trying to sell software.
It looks like the actual library to read/write an encrypted SQLite DB is licensed separately from the open source SQLite[1], so even if you have the encryption key, you can't access the encrypted format:
1: https://www.sqlite.org/see/doc/release/www/readme.wiki
> I'm not going to hold "encrypting a file format so that people can't easily reverse engineer it" against folks who are trying to sell software.
I think I'd mostly disagree. Selling an application is one thing, but the data itself is usually customers' and holding their data hostage is not a proper thing to do.
4 replies →
I guess he/she meant that the extension that encrypts the database is proprietary. I believe that was the only option for a while but now there is an open source alternative.
2 replies →