Comment by api
5 hours ago
Read the headline and thought it rescaled LLMs down for your hardware. That would be fascinating but would degrade performance.
Any work on that? Like let’s say I have 64GB memory and I want to run a 256 parameter model. At 4 bit quantized that’s 128 gigs and usually works well. 2 bits usually degrades it too much. But if you could lose data instead of precision? Would probably imply a fine tuning run afterword, so very compute intensive.
No comments yet
Contribute on Hacker News ↗