Comment by fooker
9 hours ago
> I still don't know exactly what you mean
Straight forward quantization, just to one bit instead of 8 or 16 or 32. Training a one bit neural network from scratch is apparently an unsolved problem though.
> The trees that correspond to the neural networks are huge.
Yes, if the task is inherently 'fuzzy'. Many neural networks are effectively large decision trees in disguise and those are the ones which have potential with this kind of approach.
> Training a one bit neural network from scratch is apparently an unsolved problem though.
I don't think it's correct to call it unsolved. The established methods are much less efficient than those for "regular" neural nets but they do exist.
Also note that the usual approach when going binary is to make the units stochastic. https://en.wikipedia.org/wiki/Boltzmann_machine#Deep_Boltzma...
Interesting.
By unsolved I guess I meant: this looks like it should be easy and efficient but we don't know how to do it yet.
Usually this means we are missing some important science in the classification/complexity of problems. I don't know what it could be.
Perhaps. It's also possible that the approach simply precludes the use of the best tool for the job. Backprop is quite powerful and it just doesn't work in the face of heavy quantization.
Whereas if you're already using evolution strategies or a genetic algorithm or similar then I don't expect changing the bit width (or pretty much anything else) to make any difference to the overall training efficiency (which is presumably already abysmal outside of a few specific domains such as RL applied to a sufficiently ambiguous continuous control problem).
> Training a one bit neural network from scratch is apparently an unsolved problem though.
It was until recently, but there is a new method which trains them directly without any floating point math, using "Boolean variation" instead of Newton/Leibniz differentiation:
https://proceedings.neurips.cc/paper_files/paper/2024/hash/7...
Nice!