Comment by cmrdporcupine
16 hours ago
Use them via DeepInfra instead of z.ai. No reliability issues.
https://deepinfra.com/zai-org/GLM-5.1
Looks like fp4 quantization now though? Last week was showing fp8. Hm..
16 hours ago
Use them via DeepInfra instead of z.ai. No reliability issues.
https://deepinfra.com/zai-org/GLM-5.1
Looks like fp4 quantization now though? Last week was showing fp8. Hm..
Deepinfra's implementation of it is not correct. Thinking is not preserved, and they're not responding to my submitted issue about it.
I also regularly experience Deepinfra slow to an absolute crawl - I've actually gotten more consistent performance from Z.ai.
I really liked Deepinfra but something doesn't seem right over there at the moment.
Damn. Yeah, that sucks. I did play with it earlier again and it did seem to slow down.
It's frankly a bummer that there's not seemingly a better serving option for GLM 5.1 than z.AI, who seems to have reliability and cost issues.