Comment by sharikous

4 years ago

As one who understands floats I really wish there was a better notation for literals, the best would be a floating literal in binary representation.

For integers you can write 0x0B, 11, 0b1011 and have a very precise representation

For floats you write 1e-1 or 0.1 and you get an ugly truncation. If it were possible to write something like 1eb-1 (for 0.5) and 1eb-2 (for 0.25)... people would be incentivate to use nice negative power of 2 floats, which are much less error prone than ugly base conversions.

This way you can overcame the fears around floats being nonexact and start writing more accurate tests (bit per bit) in many cases

7 comments

sharikous

jacobolus 4 years ago

> better notation for literals [...] something like 1eb-1 (for 0.5) and 1eb-2 (for 0.25)

There are floating point hex literals. These can be written as 0x1p-1 == 0.5 and 0x1p-2 == 0.25.

You can use them in C/C++, Java, Julia, Swift, ..., but they are not supported everywhere.

https://observablehq.com/@jrus/hexfloat

Aardwolf 4 years ago

C++ hex floats are an interesting combination of 3 numeric bases in one!
the mantissa is written in base 16
the exponent is written in base 10
the exponent itself is a power of 2 (not of 16 or 2), so that's base 2
One can only wonder how that came to be. I think they chose base 10 for the exponent to allow using the 'f' suffix to denote float (as opposed to double)
adgjlsfhk1 4 years ago
Julia is missing 32 bit and 16 bit hex floats unfortunately.
- simonbyrne 4 years ago
  
  You can just wrap the literal in a conversion function, eg Float32(0x1p52), which should get constant propagated at compile time.
  
  1 reply →

wruza 4 years ago

Are you asking for pow(2, n)? Or for `float mkfloat(int e, int m)` which literally implements the given formula? I doubt that the notation you’re suggesting will be used in code, except for really rare bitwise cases.

The initial precision doesn’t really matter, because if you plan to use this value in a computation, it will quickly accumulate an error, which you have to deal with anyway. There are three ways to deal with it: 1) ignore it, 2) account for it, 3) use numeric methods which retain it in a decent range. You may accidentally (1)==(3), but the problem doesn’t go away in general.

stncls 4 years ago

I'm not sure this is what you are asking for, but would this be suitable?

> ISO C99 and ISO C++17 support floating-point numbers written not only in the usual decimal notation, such as 1.55e1, but also numbers such as 0x1.fp3 written in hexadecimal format. [...] The exponent is a decimal number that indicates the power of 2 by which the significant part is multiplied. Thus ‘0x1.f’ is 1 15/16, ‘p3’ multiplies it by 8, and the value of 0x1.fp3 is the same as 1.55e1.

https://gcc.gnu.org/onlinedocs/gcc/Hex-Floats.html