Comment by jhj

2 months ago

Unlike quantization, dimensionality reduction/low rank approximation, distillation etc, lossless compression is an always-correct addition to any ML system as you are computing the same thing you did before, the only question is if it is fast enough to not cause substantial bottlenecks and if the achievable compression ratio is high enough to be useful.

Floating point is just an inefficient use of bits (due to excessive dynamic range), especially during training, so it will always be welcome there. Extreme quantization techniques (some of the <= 4-bit methods, say) also tend to increase entropy in the weights limiting the applicability of lossless compression, so lossless and lossy compression (e.g., quantization) sometimes go against each other.

If you have billions in dollars in inference devices, even reducing the number of devices you need for a given workload by 5% is very useful.

4 comments

jhj

danielmarkbruce 2 months ago

"always correct"...

Dylan16807 2 months ago
Yes. It doesn't change the output, so it is a correct optimization.
- danielmarkbruce 2 months ago
  
  Except it's being used in a situation where correctness isn't important. A close approximation is more than fine. In fact, an approximation might be better because it's more generalizable.
  Hence, it's a bs thing to say. And it sounds clever - the worst type of bs.
  
  1 reply →