← Back to context

Comment by camgunz

5 months ago

Sure, I think CBOR's "suggested" tags (or whatever they are) are probably useful to most people. The tradeoff is that they create pressure for implementations to support them, and that's not free. For example, bignum libraries are pretty heavyweight; they're not really the kind of thing you'd want to include in a C implementation as a dependency, especially when very few of your users will use them. Well OK, now you have a choice between:

- include it anyway, bloat your library for almost everyone, maybe consider supporting different underlying implementations, manage all these dependencies forever, also those libraries have different ways of setting precision, allocating statically or dynamically, etc, so expose that somehow

- don't include it, you're probably now incompatible with all dynamic language implementations that get bignums for free and you should note that up front

This is just one example, but it's pretty representative of Bormann's "have your cake and eat it too" design instincts where he tosses on features and doesn't consider the tradeoffs.

> One example is the tag 24 "Encoded CBOR data item" (Section 3.4.5.1), which indicates that the following byte string is encoded as CBOR. Since this string has the size in bytes, every array or map can be embedded in such tags to ensure the easy skippability.

This only works for types that aren't nested unless you significantly complicate bookkeeping during serialization (store the byte size of every compound object up front), which has the potential to seriously slow down serializing. My approach to that would be to let individual apps do that if they want (encode the size manually), because I don't think it's a common usage.

> Well OK, now you have a choice between: - include it anyway, [...] - don't include it, [...]

So guess that's why MP doesn't have a bignum. But MP's inability to store anything more than (u)int64 and float64 does make its data model technically different from JSON because JSON didn't properly specify that its number format should be round-trippable in those native types. Even worse, if you could assume that everything is at most float64 then you still have to write a considerable amount of subtle code to do the correct round-trip! [1] At this point your code would already contain some bignum stuffs anyway. So why not support bignums then?

[1] Correct floating point formatting and parsing is very difficult and needs a non-trivial amount of precomputed tables and sometimes bignum routines (depends on the exact algorithm)---for the record I'm the main author of Rust's floating point formatting routine. Also for this reason, most language-standard libraries already have a hidden support for size-limited bignums!

> My approach to that would be to let individual apps do that if they want (encode the size manually), because I don't think it's a common usage.

I mean, the supposed processability is already a poorly defined metric as I wrote earlier. I too suppose that it would be entirely up to the application's (or possibly library's educated) request

  • > But MP's inability to store anything more than (u)int64 and float64 does make its data model technically different from JSON....

    Yeah I don't love the MP/JSON comparison the site pushes. I don't really think they solve the same problems, but the reasons are kind of obscure so shrug. MP is quite different from JSON and yeah, numbers is one of those ways.

    > [1] Correct floating point formatting and parsing is very difficult and needs a non-trivial amount of precomputed tables and sometimes bignum routines (depends on the exact algorithm)---for the record I'm the main author of Rust's floating point formatting routine. Also for this reason, most language-standard libraries already have a hidden support for size-limited bignums!

    Oh man yeah tell me about it; I attempted this way back when and gave up lol. I was doing a bunch of research into arbitrary precision libraries and the benchmarks all contain "rendering a big 'ol floating point number" and that's why. Wild.

    > I mean, the supposed processability is already a poorly defined metric as I wrote earlier. I too suppose that it would be entirely up to the application's (or possibly library's educated) request

    I think in practice implementations are either heavily spec'd (FIDO) on top of a restricted subset of CBOR, or they control both sender and receiver. This is why I think much of the additional protocol discussion in CBOR is pretty moot; if you're taking the CBOR spec's advice on protocols you're not building a good protocol.

    • > Oh man yeah tell me about it; I attempted this way back when and gave up lol. I was doing a bunch of research into arbitrary precision libraries and the benchmarks all contain "rendering a big 'ol floating point number" and that's why. Wild.

      Yes, it is a stuff that people generally don't even realize its existence. To my knowledge only RapidJSON and simdjson seriously invested in optimizing this aspect---their authors do know this stuff and difficulty. Others tend to use a performant but not optimal library like double-conversion (which was the SOTA at the time of release!).

> Well OK, now you have a choice between: - include it anyway, [...] - don't include it, [...]

I do not see an issue here. In decoder, one does not need bignum library, just pass bignum as a memory blob to application.

In application, one knows semantic restriction on given values, and either reject bignums as semantically-invalid out-of-range, or need bignum processing library anyways.

  • Nah it's a pain in the ass if I'm writing a C program to consume your API and I need to pull in MPFR because you used bignums.