← Back to context

Comment by seanmcdirmid

21 days ago

You can check out https://news.ycombinator.com/item?id=43856489

Copilot comparison:

Intelligence: Qwen2.5-Coder-32B is widely considered the first open-source model to reach GPT-4o and Claude 3.5 Sonnet levels of coding proficiency. While Copilot (using GPT-4o) remains highly reliable, Qwen often produces more concise code and can outperform cloud models in specific tasks like code repair.

Latency: Local execution on an M3 Max provides near-zero network latency, resulting in faster "start-to-type" responses than Copilot, which must round-trip to the cloud.

Reliability: Copilot is an all-in-one "vibe" that integrates deeply into VS Code. Qwen requires local tools like Ollama or MLX-LM and a plugin like Continue.dev to achieve the same UX.

GPT-Codex:

Intelligence & Reasoning: In recent 2025–2026 benchmarks, the Qwen3-Coder series has emerged as the strongest open-source performer, matching the "pass@5" resolution rates of flagship models like GPT-5-High. While OpenAI’s latest GPT-5.1-Codex-Max remains the overall leader in complex, project-wide autonomous engineering, Qwen is frequently cited as the better choice for local, file-specific logic.

Architecture & Efficiency: OpenAI models like GPT-OSS-20b (a Mixture-of-Experts model) are optimized for extreme speed and tool-calling. However, the M3 Max with 64GB is powerful enough to run the Qwen3-Coder-30B or 32B models at full fidelity, which provides superior logic to OpenAI's smaller "mini" or "OSS" models.

Context Window: Qwen models offer substantial context (up to 128K–256K tokens), which is comparable to OpenAI’s specialized Codex variants. This allows you to process entire modules locally without the high per-token cost of sending that data to OpenAI's servers.