← Back to context

Comment by embeddnet

3 days ago

Rest assured, all the big players (openai, google, deepseek etc) have run countless experiments with 4,3,2,1.58,1 bits, and various sparse factors and shapes. This barrel has been scraped to the bottom

I have doubts about this. Perhaps the closed models have, but I wouldn't be so sure for the open ones.

GLM 5, for example, is running 16-bit weights natively. This makes their 755B model 1.5TB in size. It also makes their 40B active parameters ~80GB each.

Compare this to Kimi K2.5. 1T model, but it's 4-bit weights (int4), which makes the model ~560 GB. Their 32B active parameters are ~16 GB.

Sure, GLM 5 is the stronger model, but is that price worth paying with 2-3x longer generation times? What about 2-3x more memory required?

I think this barrel's bottom really hasn't been scraped.