Comment by lmm

5 years ago

It's not a single giant pickle dump; each individual object gets pickled and stored in Minerva (which works more or less like Cassandra or something). It's a pretty similar high level design to what the likes of Google or Facebook do do where you store everything as protobufs in BigTable - the bank uses pickle rather than protobuf because they put a higher priority on being able to store arbitrary objects and deal with robustness/compatibility later, rather than having to write a proto definition and a bunch of mapping code up front. You wouldn't want to use a relational database because they're not properly distributed (and, frankly, kind of bad and overrated).

The Minerva I worked on was temporal and append-only, like a HBase that never did compactions (so "delete" actually just writes a tombstone row at a particular timestamp - there was an "obliterate" command but you needed special authorization to use that), and it was distributed (with availability zones even) so you didn't really worry about losing data; loading data as-of a particular timestamp was part of every query (and implemented efficiently). There were probably regular dumps somewhere too but I never needed to encounter those.

So Minerva is like a distributed datastore, specifically for python object storage ?

Interesting. Do you think you would do this today with a Cassandra/Hbase? Can it be done - let's say take python 3.10 and the latest Cassandra (or even better - something like Firebase or Cloud Spanner).

Just curious that in a post AWS/Firebase world, can something like Minerva be built, without investing in writing the db store ground up.

  • The incarnation of Minerva I worked on actually used Cassandra as its storage backend. But it's something that's not particularly useful piecemeal; the great value of Minerva is that all the bank's data is there and it's all temporal, all access-controlled and all the rest. The most fragile and cumbersome parts of Minerva are the parts where it integrates with an external/legacy datastore - but if you tried to introduce a Minerva-style datastore as a small piece in a system that was otherwise using a "normal" technology stack, those integrations would be most of what you made.