← Back to context

Comment by CoolGuySteve

5 years ago

Situations where wire/disk bandwidth are constrained are usually better served by compressing the entire stream rather than trying to integrate some run encoding into the message format itself.

You only need to pay for decompression once to load the message into ram rather than being forced to either make a copy or pay for decoding all throughout the program whenever fields are accessed. And if the link is bandwidth constrained then the added latency of decompression is probably negligible.

The separation of concerns between compression format and encoding also allows specifically tuned compression algorithms to be used, for example like when switching zstd's many compression levels. Separating the compression from encoding also lets you compress/decompress on another processor core for higher throughput.

Meanwhile you can also do a one shot decompression or skip compression of a stream for replay/analysis; when talking over a low latency high bandwidth link/IPC; or when serializing to/from an already compressed filesystem like btrfs+zstd/lzo.

It's just more flexible this way with negligible drawbacks.

Recently I've been looking at CapnProto which is a fixed offset/size field encoding that allows for zero copy/allocation decoding, and arena allocation during message construction.

One nice design choice it has is to make default values zero on the wire by xor'ing all integral fields with the field default value.

This composes well with another nice feature it has, which is an optional run-length style packed encooding that compresses these zero bytes down. Overall, not quite msgpack efficiency but still very good.

One even more awesome feature is you can unpack the packed encoding without access to the original schema.

Overall I think it's a well designed and balanced feature set.

  • I’ve been using CapnProto, and while I like it, it certainly has a small community, and support can suffer due to that. I haven’t tried it, but have heard good thing about flatbuffers, and would def give that a second look if I were to make the decision again.

    • For me the clinchers over flatbuffers are:

      - No need to build messages 'inside out' / leaves first to get zero copy behaviour. This is huge.

      - Fully functional RPC implementation with really interesting latency hiding behaviour via asynchronous promises. It's got most of what I want from gRPC without the horrid googlisms and dependencies.

      - CapnProto has a canonical encoding, which makes it suitable for hashing and comparison.

      - It has a packed encoding, whereas with Flatbuffers you're going to have to compress with a more hardcore codec like zlib.

      - CapnProto supports runtime reflection via a dynamic API. This is useful in cases where you really need it, like when creating generic bindings for dynamically typed languages or use cases. Like protobufs it has a "lite" mode to opt out of this.

      - CapnProto JSON support in the library is better, probably due to the above.

      Major cons:

      - Ugly schema format.

      - Not header only (I've not tested whethwr or not there is a positive here with respect to build times)

      - The "kj" support library is a bit of a pain in the ass, refusing to use common C++ vocabulary types like string_view or support common STL data structures like vectors or strings. Writing a message in to a std::vector for instance will require a specialization of an interface.

  • Few years back, i actually compared the three serializations, for the data i used ironically raw struct came on top for every benchmark https://cloudef.pw/protobug.png

    • This matches my experience as well. You should probably add code size to the graphs. Raw struct probably wins there too.

      The only thing I’ve seen that’s competitive to raw struct is writing a C++ template visitor style pattern to walk the struct fields using inline methods. (This can be achieved in some other compiled languages, of course.)

      It lets you compress things as you walk the struct, and breaks the dependency on struct layout (which can matter for cross-platform stuff, and for versioned protocols). It’s not quite as fast as a memcpy off the network buffer, but it can be branch and virtual dispatch free (not counting branches in the compression logic).

      Also, it can validate as it copies the data into the struct, so now you can auto-generate application-specific validation logic. This is a big deal, since validating the output of a deserializer is almost as hard as hand deserialization!

      I really liked the article however. Forcing tail optimization in the c++ template serializer sounds like it will substantially improve the (worst case) generated assembly.