Comment by hasperdi

2 days ago

Correction, my GLM-4.6 models are not Q4, I can only run lower ones eg:

- https://huggingface.co/unsloth/GLM-4.6-GGUF/blob/main/GLM-4.... - 84GB, Q1 - https://huggingface.co/unsloth/GLM-4.6-REAP-268B-A32B-GGUF/t... - 92GB, Q2

I ensure that there are enough RAM leftover ie limited context window setting, so no swapping.

As for GLM-4.5-Air, I run that daily, switching between noctrex/GLM-4.5-Air-REAP-82B-A12B-MXFP4_MOE-GGUF and kldzj/gpt-oss-120b-heretic

Are you getting any agentic out of gpt-oss-120b?

I can't tell if it's some bug regarding message formats or if it's just genuinely giving up, but it failed to complete most tasks I gave it.

  • GPT-oss-120B was also completely failing for me, until someone on reddit pointed out that you need to pass back in the reasoning tokens when generating a response. One way to do this is described here:

    https://openrouter.ai/docs/guides/best-practices/reasoning-t...

    Once I did that it started functioning extremely well, and it's the main model I use for my homemade agents.

    Many LLM libraries/services/frontends don't pass these reasoning tokens back to the model correctly, which is why people complain about this model so much. It also highlights the importance of rolling these things yourself and understanding what's going on under the hood, because there's so many broken implementations floating around.