← Back to context

Comment by camgunz

5 months ago

You can replace "pull in MPFR" with "work any harder than just using `double`". Bignums are an obvious pain in the ass; I can think of no data representation formats that include support for them and that's why

I'm aware of plenty (though I have surveyed at least 20 formats in the past and so that would include more obscure ones). At the very least, you can feed it back to sscanf if you are fine with an ordinary float or double, a thoughtful API would include this as an option too. That's what I expect for the supposed bignum support: round-trippability.

  • Maybe an example is useful. I want to build a generic CBOR decoder in C. I have 2 options:

    - link GMP/mpdecimal/whatever (or hey, provide an abstraction layer and let a user choose)

    - accept function pointers to handle bignum tags

    Function pointers are an irritation (I know this because my MP library uses them), they're slower than not using them, you've gotta check for NULL a lot, you're also asking any application that uses your library and wants bignum support to include GMP itself (with all the attendant maintenance, setup, etc.)

    Or, you can include it yourself, but welcome to doing all the maintenance yourself, and exposing all of GMP's knobs (ex: [0])

    You might argue that these aren't the only options, but a deserialized value has to be understood by the application; your suggestions aren't good tradeoffs. sscanf (also do not use sscanf) doesn't work if the value is actually a bignum, and yielding a bespoke bignum format is just as unusable as simply returning whatever's encoded in CBOR. How would I add two such values together? How would I display it? This is what bignum libraries are for.

    All this is made far worse by the fact that there are effectively no public CBOR (or MP) APIs where you're expecting them to be consumed entirely by generic decoders, so there's not even a need to force generic decoders to go through all this effort to support bignums (etc.) Further, unlike MP, CBOR doesn't let you use tags for application-specific purposes. Put it all together and it's uniformly worse: implementations are either more complex or have surprising holes, you can't count on generic decoders supporting tags when building an API or defining messages, and you can't even just say, "for this protocol, tag 31 is a UUID".

    This is probably a big reason (though I can think of others) why the only formats you can think of w/ bignum support are obscure.

    > That's what I expect for the supposed bignum support: round-trippability.

    Round-tripping is only meaningful if a receiver can use the values before reserializing, otherwise memcpy meets your requirements. If a sender gives me a serialized bignum, the deserializing library has to deserialize it into a value I can understand and use; that's the whole point of a deserialization library.

    MP's support for timestamps is a reasonable example here: it decomposes into a time_t, and it can do this because it defines the max size. You can't do that w/ a bignum--the whole point of a bignum is it's big beyond defining. A CBOR sender can send you an infinite series of digits, and the spec doesn't reckon with this at all.

    [0]: https://gmplib.org/manual/Memory-Management

    • > I have 2 options: - link GMP/mpdecimal/whatever (or hey, provide an abstraction layer and let a user choose) - accept function pointers to handle bignum tags

      I would just provide two kinds of functions:

          // For each representative native type...
          cbor_read_t cbor_read_float(struct cbor *ctx, float *f);
      
          // And there is a generic number handling:
          struct cbor_num {
              int sign; // -1, 0 or 1
              int base; // 10 or 16
              int exponent;
              const char *digits;
              size_t digits_len;
          };
          cbor_read_t cbor_read_number(struct cbor *ctx, struct cbor_num *num);
      
          // And then someone will define the following on top of cbor_read_number:
          cbor_read_t my_cbor_read_mpz(struct cbor *ctx, mpz_t num);
      

      Memory lifetime and similar has to be also considered here (left as an exercise), but the point is that you never need function pointers in this case. In fact I would actively avoid them because proper function pointer support is indeed a PITA as you said. They can generally be avoided with the (sorta) inversion of control, which is popular in compact C APIs and to some extent also in Rust APIs. It is just you haven't thought of this possibility.

      > sscanf (also do not use sscanf) doesn't work if the value is actually a bignum, and yielding a bespoke bignum format is just as unusable as simply returning whatever's encoded in CBOR. How would I add two such values together? How would I display it? This is what bignum libraries are for.

      In practice many bignums are just left as is. For example X.509 certificate serial numbers are technically bignums, but you never compute anything out of them. So you don't need any bignum to read serial numbers. If you do need computation then you need an adapter function as above, but the library proper needs no knowledge about such adapter. What's a problem now?

      By the way, sscanf is fine here because the API's contract constrains sscanf's inputs enough to be safe. Sscanf in general is also safe when every `char*` outputs are bounded. It is certainly a difficult beast, but so is everything about C.

      1 reply →