← Back to context

Comment by neutrinobro

13 hours ago

We might already be there. I've been running Qwen-3.6-27B with 8-bit quantization locally with llama.cpp (~100k context window), and to be honest for my use case, 40-50% of the time it is more usable than claude-code. I only have the $20/mo plan, so I often hit rate limits after 2-3 prompts. And while the local model is slower, it just keeps chugging, is practically free, and more often than not produces code similar to claude. I wouldn't be surprised if in 6-12 months we have local models which are comparable to opus 4.6...which I personally consider as a tipping point where agentic coding became practical.