Comment by jjcob

2 days ago

One problem I was struggling with: What's the shortest, unambiguous decimal representation of a float?

For example, if you use single precision floats, then you need up to 9 digits of decimal precision to uniquely identify a float. So you would need to use a printf pattern like %.9g to print it. But then 0.1 would be output as 0.100000001, which is ugly. So a common approach is to round to 6 decimal digits: If you use %.6g, you are guaranteed that any decimal input up to 6 significant digits will be printed just like you stored it.

But you would no longer be round-trip safe when the number is the result of a calculation. This is important when you do exact comparisons between floats (eg. to check if data has changed).

So one idea I had was to try printing the float with 6 digits, then scanning it and seeing if it resulted in the same binary representation. If not, try using 7 digits, and so on, up to 9 digits. Then I would have the shortest decimal representation of a float.

This is my algorithm:

    int out_length;
    char buffer[32];
    for (int prec = 6; prec<=9; prec++) {
        out_length = sprintf(buffer, "%.*g", prec, floatValue);
        if (prec == 9) {
            break;
        }
        float checked_number;
        sscanf(buffer, "%g", &checked_number);
        if (checked_number == floatValue) {
            break;
        }
    }

I wonder if there is a more efficient way to determine that shortest representation rather than running printf/scanf in a loop?

Your problem is practically important as it can be considered as the "canonical" string representation of given floating point number (after adding the closestness requirement). There are many efficient algorithms as a result, selected algorithms include Dragon4, Grisu3, Ryu and Dragonbox. See also Google's double-conversion library which implements first two of them.

  • Thank you for pointing me to all the prior work on this!

    I am surprised how complex the issue seems to be. I assumed there might be an elegant solution, but the problem seems to be a lot harder than I thought.

    • See also the code in LLVM Flang for binary/decimal conversions. Minimal-digit decimals don’t come up in Fortran as often as other languages, but I use them when allowed by the output format. Interestingly, the last digit of a minimal decimal conversion is often irrelevant, so long as it exists.

  • Since such algorithms were developed in the 1990's, nowadays you can expect your language's standard library to use them for float-to-decimal and decimal-to-float conversions. So all you need to do in code is to print the float without any special formatting instructions, and you'll get the shortest unique decimal representation.

    • Except that C specifies that floating point numbers should be printed in a fixed precision (6 decimal digits) when no precision is given. Internally they do use some sort of float-to-decimal algorithms [1], but you can't get the shortest representation out of them.

      [1] Some (e.g. Windows CRT) do use the shortest representation as a basis, in which case you can actually extract it with large enough precision (where all subsequent digits will be zeros). But many libcs print the exact representation instead (e.g. 3.140000000000000124344978758017532527446746826171875 for `printf("%.51f", 3.14)`), so they are useless for our purpose.

      5 replies →

Don't worry about negative comments, yes, this is not the best way, but (if there's no error, I didn't analyze it thoroughly) it's often good enough; if it works, then it works.

I once wanted to find a vector for which Euler rotation (5°, 5°, 0) will result with the same vector, so I just ran a loop of million iterations or so which would take a starting vector, translate it randomly slightly (add a small random vector) and see if after rotation it's closer to original than the previous vector would be after rotation, if not, discard the change otherwise keep it. The script ran for a couple seconds on Python and with decreasing translation vector based on iteration number, I got perfect result (based on limited float precision of the vector). 2s would be terribly slow in a library but completely satisfying for my needs :D

  • FYI, you can solve your problem in closed form by converting your Euler angles into the 'rotation vector'[1] or 'axis-angle'[2] representations and then normalizing the resulting vector.

    [1]: https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representat...

    [2]: https://en.wikipedia.org/wiki/Axis%E2%80%93angle_representat...

    • Cool! Thanks for that; it makes perfect sense: if you convert the rotation to axis-angle, and then take that axis, obviously a vector rotated around the axis defined by itself won't change.

      I just tested it:

          from bpy import context as C
          from mathutils import Vector, Euler
          from math import radians as rad
      
          r = Euler((rad(5), rad(5), 0))
          ob = C.object
          ob.rotation_euler = r
          ob.rotation_mode = 'AXIS_ANGLE'
          a, x, y, z = ob.rotation_axis_angle
          v = Vector((x, y, z))
          print(v)
          v.rotate(r)
          print(v)
          print("--")
      
      

      Can be done without using an object:

          from mathutils import Vector, Euler
          from math import radians as rad
      
          r = Euler((rad(5), rad(5), 0))
          v = Vector(r.to_quaternion().axis)
          print(v)
          v.rotate(r)
          print(v)
          print("--")

> I wonder if there is a more efficient way to determine that shortest representation rather than running printf/scanf in a loop?

Yes, just `printf("%f", ...);` will get you that.

The actual algorithms to do the float->string conversion are quite complicated. Here is a recent pretty good one: https://github.com/ulfjack/ryu

I think there's been an even more recent one that is even more efficient than Ryu but I don't remember the name.

No, just no. And, never use sscanf().

This is totally pointless when serialization to and from an unsigned integer that's binary equivalent would be perfectly reversible and therefore wouldn't lose any information.

    double f = 0.0/0.0; // might need some compiler flags to make this a soft error.
    double g;
    char s[9];

    assert(sizeof double == sizeof uint64_t);

    snprintf(s, 9, "%0" PRIu64, *(uint64_t *)(&f));

    snscanf(s, 9, "%0" SCNu64, (uint64_t *)(&g));

If you want something shorter, apply some sort of heuristic that doesn't sacrifice faithful reproduction of the original representation, e.g., idempotency.

  • Off-topic, For this kind of pointer casting, shouldn't you be using a union? I believe this is undefined behaviour, as written.

    • As written, it is UB, yes, but certainly in C++, and, I think, also in C, using a union is undefined behavior, too. I think (assuming isn’t and float to be of the same size) the main risk is that, if you do

         union {
           float f;
           int i;
         } foo;
         foo.f = 3.14;
         printf(“%x”, foo.i);
      

      that the compiler can think the assignment to foo.f isn’t used anywhere, and thus can chose not to do it.

      In C++, you have to use memmove (compilers can and often do recognize that idiom)

    • Yes, it violates the standard, although in practice it should work because the alignment is the same.

  • If the goal is just to reliably serialize and deserialize floating point numbers, use `%a` instead. It will print and read hexadecimal floating point literals.