Comment by lifthrasiir
5 months ago
To be fair, CBOR proper is amazingly well designed given its constraints and design-by-committee nature. It is not even hard to remember the whole specification in your head due to the regular design. Unfortunately though I can't say that for any other CBOR ecosystem; many related specs do show varying level of signs of bloat. I recently heavily criticized the packed CBON draft because I couldn't make any sense out of it [1], and Bormann seemed to have clearly missed most of my points.
[1] https://mailarchive.ietf.org/arch/msg/cbor/qdMZwu-CxHT5XP0nj...
Disclaimer: I wrote and maintain a MessagePack implementation.
To be uncharitable, that's probably because CBOR's initial design was lifted from MP, and everything Bormann added to it was pretty bad. This snippet from your great post captures it pretty well I think:
---
CBOR records the number of nested items and thus has to maintain a stack to skip to a particular nested item.
Alternatively, we can define the "processability" to only include a particular set of operations. The statement 3c implies and 3d seems to confirm that it should include a space-constrained decoding, but even that is quite vague. For example,
- Can we assume we have enough memory to buffer the whole packed CBOR data item? If we can't, how many past bytes can we keep during the decoding process?
> To be uncharitable, that's probably because CBOR's initial design was lifted from MP, and everything Bormann added to it was pretty bad.
To be clear, I disagree and believe that Bormann did make a great addition by forking. I can explain this right away by how my point can be fixed entirely within CBOR itself.
CBOR tags are of course not required to be processed at all, but some common tags have useful functions that many implementations are expected to implement them. One example is the tag 24 "Encoded CBOR data item" (Section 3.4.5.1), which indicates that the following byte string is encoded as CBOR. Since this string has the size in bytes, every array or map can be embedded in such tags to ensure the easy skippability. [1] This can be made into a formal rule if the supposed processability is highly desirable. And given those tags are defined so early, my design sketch should have been already considered in advance, which is why I believe CBOR is indeed designed better.
[1] Alternatively RFC 8742 CBOR sequences (tag 63) can be used to emulated an array or map of an indeterminate size.
Sure, I think CBOR's "suggested" tags (or whatever they are) are probably useful to most people. The tradeoff is that they create pressure for implementations to support them, and that's not free. For example, bignum libraries are pretty heavyweight; they're not really the kind of thing you'd want to include in a C implementation as a dependency, especially when very few of your users will use them. Well OK, now you have a choice between:
- include it anyway, bloat your library for almost everyone, maybe consider supporting different underlying implementations, manage all these dependencies forever, also those libraries have different ways of setting precision, allocating statically or dynamically, etc, so expose that somehow
- don't include it, you're probably now incompatible with all dynamic language implementations that get bignums for free and you should note that up front
This is just one example, but it's pretty representative of Bormann's "have your cake and eat it too" design instincts where he tosses on features and doesn't consider the tradeoffs.
> One example is the tag 24 "Encoded CBOR data item" (Section 3.4.5.1), which indicates that the following byte string is encoded as CBOR. Since this string has the size in bytes, every array or map can be embedded in such tags to ensure the easy skippability.
This only works for types that aren't nested unless you significantly complicate bookkeeping during serialization (store the byte size of every compound object up front), which has the potential to seriously slow down serializing. My approach to that would be to let individual apps do that if they want (encode the size manually), because I don't think it's a common usage.
11 replies →