← Back to context

Comment by camgunz

5 months ago

> Is it? Take JSON: its spec states that JSON numbers are theoretically infinite precision rationals, but implementations are free to impose their own restrictions.

Well like you say JSON gives you an out: "An implementation may set limits on the range and precision of numbers"; CBOR doesn't. I'm not really making a claim about CBOR vs. JSON (or HTTP). My TL;DR on this is: the nice thing about MP is that it asks very little of implementations, and that gives them a lot of freedom. CBOR asks way more of implementations--which by itself isn't bad--but it reckons with the tradeoffs not at all. A good example is "indefinite-length encodings"; this paragraph is still in the spec:

---

Note that some applications and protocols will not want to use indefinite-length encoding. Using indefinite-length encoding allows an encoder to not need to marshal all the data for counting, but it requires a decoder to allocate increasing amounts of memory while waiting for the end of the item. This might be fine for some applications but not others.

---

You might think, "well that's not so bad, maybe I don't have to implement this, after all it is in the seemingly optional 'Creating CBOR-Based Protocols' section", but unfortunately it's a core part of CBOR [1].

Confusingly, CBOR seems to care about this kind of thing: "The design does not allow nesting indefinite-length strings as chunks into indefinite-length strings. If it were allowed, it would require decoder implementations to keep a stack, or at least a count, of nesting levels." [2]. But what's the difference between having to keep a stack of nesting levels and having to allocate as much as your network peer tells you to?

The fact is you can't reasonably implement streaming in a data representation format. It's protocol-level functionality, which is why HTTP/1.1's description of it is way more useful. Including it is pretty indicative of the whole "let's just throw some features into the spec and see what happens" attitude.

[0]: https://datatracker.ietf.org/doc/html/rfc8949#section-5.1

[1]: https://datatracker.ietf.org/doc/html/rfc8949#name-indefinit...

[2]: https://datatracker.ietf.org/doc/html/rfc8949#name-indefinit...

I admit I've only skimmed the RFC, but it seems to explicitly allow receivers to refuse to deal with lots of features. It doesn't say anywhere that compliant decoders MUST accept all well-formed CBOR inputs. In fact, it says quite the opposite.

> But what's the difference between having to keep a stack of nesting levels and having to allocate as much as your network peer tells you to?

In that you may preallocate a (large enough) buffer in the latter case, and bail out when the incoming message grows out of it but still be able to skip to the rest of the message as opposed to not being able to re-synchronize because the minimally needed parsing context grows without bound?

  • > I admit I've only skimmed the RFC, but it seems to explicitly allow receivers to refuse to deal with lots of features.

    That's true but like, at what point are people disappointed your library doesn't support functionality that they think is pretty core? I'm not saying libraries can or should support everything, just that specs that ask a lot of implementations are inviting this kind of thing, and that while MP considers this, CBOR pretty clearly does not.

    > In that you may preallocate a (large enough) buffer in the latter case, and bail out when the incoming message grows out of it but still be able to skip to the rest of the message as opposed to not being able to re-synchronize because the minimally needed parsing context grows without bound?

    Neither of these are recoverable because you don't know how much to skip to resync (assuming skipping a bunch of data isn't by itself fatal):

    - Stack: Maps and arrays only tell you their elements/pair counts, not how many bytes to skip in a stream, so to skip them you have to parse them fully, because they may be nested.

    - Stream: By definition you don't know how much to skip.

    Again these are the kinds of issues you'd address in a protocol definition. Though it kind of tries to pose as one, CBOR is not a protocol definition. A more reasonable thing to do would be to stream MP over HTTP, because you get so many things for free (connection management, caching and TLS to name a few).