Comment by Aerroon

2 days ago

I have doubts about this. Perhaps the closed models have, but I wouldn't be so sure for the open ones.

GLM 5, for example, is running 16-bit weights natively. This makes their 755B model 1.5TB in size. It also makes their 40B active parameters ~80GB each.

Compare this to Kimi K2.5. 1T model, but it's 4-bit weights (int4), which makes the model ~560 GB. Their 32B active parameters are ~16 GB.

Sure, GLM 5 is the stronger model, but is that price worth paying with 2-3x longer generation times? What about 2-3x more memory required?

I think this barrel's bottom really hasn't been scraped.