Comment by cubefox

4 months ago

I think this approach is not so interesting because it's just quantization of a full precision model. So it speeds up inference (at a quality penalty) but not training. It would be more interesting to train an actually binary model directly, without any floating point multiplication, like in this paper: https://proceedings.neurips.cc/paper_files/paper/2024/hash/7...

0 comments