Comment by michaelt
14 days ago
No need to unpack for inference. As things like CUDA kernels are fully programmable, you can code them to work with 4 bit integers, no problems at all.
14 days ago
No need to unpack for inference. As things like CUDA kernels are fully programmable, you can code them to work with 4 bit integers, no problems at all.
No comments yet
Contribute on Hacker News ↗