← Back to context

Comment by titzer

3 years ago

Last year I did some consulting for a client using Google cloud services such as Spanner and cloud storage. Storing and indexing mostly timeseries data with a custom index for specific types of queries. It was difficult for them to define a schema to handle the write bandwidth needed for their ingestion. In particular it required a careful hashing scheme to balance load across shards of the various tables. (It seems to be a pattern with many databases to suck at append-often, read-very-often patterns, like logs).

We designed some custom in-memory data structures in Java but also also some of the standard high-performance concurrent data structures. Some reader/write locks. gRPC and some pub/sub to get updates on the order of a few hundred or thousand qps. In the end, we ended up with JVM instances that had memory requirements in the 10GB range. Replicate that 3-4x for failover, and we could serve queries at higher rates and lower latency than hitting Spanner. The main thing cloud was good for was the storage of the underlying timeseries data (600GB maybe?) for fast server startup, so that they could load the index off disk in less than a minute. We designed a custom binary disk format to make that blazingly fast, and then just threw binary files into a cloud filesystem.

If you need to serve < 100GB of data and most of it is static...IMHO, screw the cloud, use a big server and replicate it for fail-over. Unless you got really high write rates or have seriously stringent transactional requirements, then man, a couple servers will do it.

YMMV, but holy crap, servers are huge these days.

I find disk io to be a primary reason to go with bare metal. The vm abstractions just kill io performance. In a single server you can fill up the PCI lanes with flash and hit some ridiculous throughput numbers.

When you say “screw the cloud”, you mean “administer an EC2 machine yourself” or really “buy your own hardware”?

  • The former, mostly. You don't necessarily have to use EC2, but that's easy to do. There are many other, smaller providers if you really want to get out from under the big 3. I have no experience managing hardware, so I personally wouldn't take that on myself.