← Back to context

Comment by CuriouslyC

10 hours ago

The difference here is that those small models are impressive, but not super useful. Deepseek 4 is impressively cheap for the intelligence, but not reliable enough to daily drive unless your time has low value.

GLM passes a meaningful threshold of reliability/utility that puts it in a different category for real work. Just like Opus really took off after passing a threshold with 4.5. It's the first open model to do that.

Qwen3.6-27b is surprisingly good for tasks that need modifying an existing repo by analogy with the existing code. For example, you have an existing CRUD app and want to add a new domain model and expose it via the API. Qwen3.6 analyzes how things are done in the project and makes it work flawlessly in one shot, and the code is what you expected more-less. Qwen3.6 only struggles with non-trivial code or when you bootstrap a project from scratch. But that doesn't happen often.

I once gave Sonnet 4.6 and Qwen 3.6 the same task in this style: extend the existing code with this new requirement. Qwen3.6-27b perfectly followed the conventions, while Sonnet 4.6 invented its own conventions that were rejected during CR by another dev. Qwen3.6-27b, run locally, also managed to finish faster on that task.

Qwen models are super useful for those running local.

And there are valid reasons to run local, even if performance (quality and speed) aren't best.