← Back to context

Comment by ognyankulev

4 days ago

The fact that the long article misses to make the historical/continuation link to MessagePack is by itself a red flag signalling a CBOR ad.

Edit: OK, actually there is a separate page for alternatives: https://cborbook.com/introduction/cbor_vs_the_other_guys.htm...

Notably missing is a comparison to Cap'n Proto, which to me feels like the best set of tradeoffs for more binary interchange needs.

I honestly wonder sometimes if it's held back by the name— I love the campiness of it, but I feel like it could be a barrier to being taken seriously in some environments.

  • Doesn't Cap'n Proto require the receiver to know the types for proper decoding? This wouldn't entirely disqualify it from comparison, since e.g. protobufs are that way as well, but they make it less interesting for comparing to CBOR, which is type-tagged.

    • There's quite a few formats that are self-describing already, so having a format that can skip the type and key tagging for that extra little bit of compactness and decoding efficiency is a unique selling point.

      There's also nothing stopping you from serializing unstructured data using an array of key/value structs, with a union for the value to allow for different value types (int/float/string/object/etc), although it probably wouldn't be as efficient as something like CBOR for that purpose. It could make sense if most of the data is well-defined but you want to add additional properties/metadata.

      Many languages take unstructured data like JSON and parse them into a strongly-typed class (throwing validation errors if it doesn't map correctly) anyways, so having a predefined schema is not entirely a bad thing. It does make you think a bit harder about backwards-compatibility and versioning. It also probably works better when you own the code for both the sender and receiver, rather than for a format that anyone can use.

      Finally, maybe not a practical thing and something that I've never seen used in practice: in theory you could send a copy of the schema definition as a preamble to the data. If you're sending 10000 records and they all have the same fields in the same order, why waste bits/bytes tagging the key name and type for every record, when you could send a header describing the struct layout. Or if it's a large schema, you could request it from the server on demand, using an id/version/hash to check if you already have it.

      In practice though, 1) you probably need to map the unknown/foreign schema into your own objects anyways, and 2) most people would just zlib compress the stream to get rid of repeated key names and call it a day. But the optimizer in me says why burn all those CPU cycles decompressing and decoding the same field names over and over. CBOR could have easily added optional support for a dictionary of key strings to the header, for applications where the keys are known ahead of time, for example. (My guess is that they didn't because it would be harder for extremely-resource-constrained microcontrollers to implement).

  • > I feel like it could be a barrier to being taken seriously in some environments.

    Working as intended. ;)

    • In all seriousness: I develop Cap'n Proto to serve my own projects that use it, such at the Cloudflare Workers runtime. It is actually not my goal to see Cap'n Proto itself adopted widely. I mean, don't get me wrong, it'd be cool, but it isn't really a net benefit to me personally: maybe people will contribute useful features, but mostly they will probably just want me to review their PRs adding features I don't care about, or worse, demand I fix things I don't care about, and that's just unpaid labor for me. So mostly I'm happy with them not.

      It's entirely possible that this will change in the future: like maybe we'll decide that Cap'n Proto should be a public-facing part of the Cloudflare Workers platform (rather than just an implementation detail, as it is today), in which case adoption would then benefit Workers and thus me. At the moment, though, that's not the plan.

      In any case, if there's some company that fancies themselves Too Serious to use a technology with such a silly name and web site, I am perfectly happy for them to not use it! :)

      1 reply →