← Back to context

Comment by GistNoesis

1 day ago

The https://en.wikipedia.org/wiki/Z-fighting issue is the proof you often need those 64-bits.

It's kind of a chicken and egg problem where people use floats because there are FPUs available. All the engineering effort which went into dealing with floats and the problem that comes with them, would have been better invested in making integers faster.

We went onto the wrong path, and inertia keep us going on the wrong path. And now the wrong path is even more tempting because all efforts have made it more practical and almost as good. We hide the precision complexity to the programmer but it's still lurking around instead of being tamed.

The absolute GPU cluster-fuck with as many floating types as you can write on a napkin while drunk at the bar, mean that at the end of the day your neural network is non-deterministic, and you can't replicate any result from your program from 6 month ago, or last library version. Your simulations results therefore are perishable.

Inability to replicate results mean that you can't verify weight modifications to your neural networks haven't been tampered by an adversary. So you just lose all fighting chance to build a secure system.

You also can't share work in a distributed fashion because since verification is not possible you can't trust any computation that you haven't done yourself.

On the CPU side, yes 64 bits is a good idea, but when transferring to the GPU you simply make the camera location 0,0,0, and transform everything relative to it, thus you can easily use 32 bit float and have no z-fighting or any other precision related issues(a logarithmic depth buffer also helps).

Regarding 64 bit double vs 64 bit fixed width, I don't think there is a really good reason to bother with fixed width, it adds more instructions, and will require a custom debug visualizer to inspect the values.

Bit shifts, at least in SSE/AVX2 etc, are only able to run on a single port, so they actually aren't such a great idea(not sure about scalar, I don't bother to optimize scalar code in this way).