Comment by CoolGuySteve

5 years ago

Protobuf's abysmal performance, questionable integration into the C++ type system, append-only expandability, and annoying naming conventions and default values are why I usually try and steer away from it.

As a lingua franca between interpreted languages it's about par for the course but you'd think the fast language should be the fast path (ie: zero parsing/marshalling overhead in Rust/C/C++, no allocations) as you're usually not writing in these languages for fun but because you need the thing to be fast.

It's also the kind of choice that comes back to bite you years into a project if you started with something like Python and then need to rewrite a component in a systems language to make it faster. Now you not only have to rewrite your component but change the serialization format too.

Unfortunately Protobuf gets a ton of mindshare because nobody ever got fired for using a Google library. IMO it's just not that good and you're inheriting a good chunk of Google's technical debt when adopting it.

Zero parse wire formats definitely have benefits, but they also have downsides such as significantly larger payloads, more constrained APIs, and typically more constraints on how the schema can evolve. They also have a wire size proportional to the size of the schema (declared fields) rather than proportional to the size of the data (present fields), which makes them unsuitable for some of the cases where protobuf is used.

With the techniques described in this article, protobuf parsing speed is reasonably competitive, though if your yardstick is zero-parse, it will never match up.

  • Situations where wire/disk bandwidth are constrained are usually better served by compressing the entire stream rather than trying to integrate some run encoding into the message format itself.

    You only need to pay for decompression once to load the message into ram rather than being forced to either make a copy or pay for decoding all throughout the program whenever fields are accessed. And if the link is bandwidth constrained then the added latency of decompression is probably negligible.

    The separation of concerns between compression format and encoding also allows specifically tuned compression algorithms to be used, for example like when switching zstd's many compression levels. Separating the compression from encoding also lets you compress/decompress on another processor core for higher throughput.

    Meanwhile you can also do a one shot decompression or skip compression of a stream for replay/analysis; when talking over a low latency high bandwidth link/IPC; or when serializing to/from an already compressed filesystem like btrfs+zstd/lzo.

    It's just more flexible this way with negligible drawbacks.

    • Recently I've been looking at CapnProto which is a fixed offset/size field encoding that allows for zero copy/allocation decoding, and arena allocation during message construction.

      One nice design choice it has is to make default values zero on the wire by xor'ing all integral fields with the field default value.

      This composes well with another nice feature it has, which is an optional run-length style packed encooding that compresses these zero bytes down. Overall, not quite msgpack efficiency but still very good.

      One even more awesome feature is you can unpack the packed encoding without access to the original schema.

      Overall I think it's a well designed and balanced feature set.

      6 replies →

We jumped from protobuf -> arrow in the very beginning of arrow (e.g., wrote on the main lang impls), and haven't looked back :)

if you're figuring out serialization from scratch nowadays, for most apps, I'd def start by evaluating arrow. A lot of the benefits of protobuf, and then some

The protobuf itself as format isnt that bad, just the default implementations are bad. Slow compile times, code bloat and clunky apis / conventions. Nanopb is much better implementation and allows you to control code generation better too. Protobuf makes sense for large data, but for small data fixed length serialization with compression applied on top probably would be better.

It's obviously possible to do protobuf with zero parsing/marshalling if you stick to fixed length messages and 4/8 byte fields. Not saying that's a good idea, since there are simpler binary encodings out there when you need that type of performance.

  • This is incompatible with protobuf. Protobuf has variable length encodings for all its integers, including field tags.

    https://developers.google.com/protocol-buffers/docs/encoding

    • Actually you have both 32 and 64 bit wire types:

          - wire_type=1 64 bit: fixed64, sfixed64, double
          - wire_type=5 32 bit: fixed32, sfixed32, float
      

      Consider a valid protobuf message with such a field. If you can locate the field value bytes, you can write a new value to the same location without breaking the message. It's obviously possible to the same with the varint type too, as long as you don't change the number of bytes - not so practical, but useful for enum field which has a limited set of useful values (usually less than 128).

      Pregenerating protobuf messages you want to send and then modifying the bytes in-place before sending is going to give you a nice performance boost over "normal" protobuf serialization. It can be useful if you need to be protobuf compatible, but it's obviously better to use something like SBE - https://github.com/real-logic/simple-binary-encoding

FWIW, the python protobuf library defaults to using the C++ implementation with bindings. So even if this is a blog post about implementing protobuf in C, it can also help implementations in other languages.

But yes, once you want real high performance, protobuf will disappoint you when you benchmark and find it responsible for all the CPU use. What are the options to reduce parsing overhead? flatbuffers? xdr?