← Back to context

Comment by trollbridge

2 months ago

... so let me understand this.

It is frequently said that programming directly is obsolete, and the skill you must have now is knowing how to operate agentic AIs.

Yet you aren't allowed to do this until you're 18.

So, developing software is now 18+ only?

Qwen3 runs locally on reasonable hardware, and is comparable to a mid-2025 Claude Sonnet (albeit possibly rather slower) .

Local models are chasing the online frontier models pretty hard.

So worst case, that's the fallback (FWIW, YMMV)

edit: Qwen-3.5 MoE (and other local MoE models like it)

  • Whats "reasonable hardware"?

    • People have tried to run Qwen3-235B-A22B-Thinking-2507 on 4x $600 used, Nvidia 3090s with 24 GB of VRAM each (96 GB total), and while it runs, it is too slow for production grade (<8 tokens/second). So we're already at $2400 before you've purchased system memory and CPU; and it is too slow for a "Sonnet equivalent" setup yet...

      You can quantize it of course, but if the idea is "as close to Sonnet as possible," then while quantized models are objectively more efficient they are sacrificing precision for it.

      So next step is to up that speed, so we're at 4x $1300, Nvidia 5090s with 32 GB of VRAM each (128 GB), or $5,200 before RAM/CPU/etc. All of this additional cost to increase your tokens/second without lobotomizing the model. This still may not be enough.

      I guess my point is: You see this conversation a LOT online. "Qwen3 can be near Sonnet!" but then when asked how, instead of giving you an answer for the true "near Sonnet" model per benchmarks, they suddenly start talking about a substantially inferior Qwen3 model that is cheap to run at home (e.g. 27B/30B quantized down to Q4/Q5).

      The local models absolutely DO exist that are "near Sonnet." The hardware to actually run them is the bottleneck, and it is a HUGE financial/practical bottleneck. If you had a $10K all-in budget, it isn't actually insane for this class of model, and the sky really is the limit (again to reduce quantization and or increase tokens/second).

      PS - And electricity costs are non-trivial for 4x 3090s or 4x 5090s.

      5 replies →

    • A machine with 128GB of unified system RAM will run reasonable-fidelity quantizations (4-bit or more).

      If you ever want to answer this type of question yourself, you can look at the size of the model files. Loading a model usually uses an amount of RAM around the size it occupies on disk, plus a few gigabytes for the context window.

      Qwen3.5-122B-A10B is 120GB. Quantized to 4 bits it is ~70GB. You can run a 70GB model in 80GB of VRAM or 128GB of unified normal RAM.

      Systems with that capability cost a small number of thousand USD to purchase new.

      If you are willing to sacrifice some performance, you can take advantage of the model being a mixture-of-experts and use disk space to get by with less RAM/VRAM, but inference speed will suffer.

    • If you want something off the shelf get a MacBook Pro M5 (base "Pro" CPU) with 48GB RAM:

      Gemma 4 31B Q6: 9tok/s, I'd say it is smarter than GPT-4o, but yeah it's slow. Good for coding.

      Gemma 4 26B A4B Q4: 50tok/s. Feels faster than ChatGPT 5.4, but not as smart (as it reasons less). Good for general chatting and research.

> It is frequently said that programming directly is obsolete

Who says this?

  • The CEO of the company in question with the age limit, for one

    • I would hope most people can recognize that someone trying to sell you something might be among the least trustworthy sources about that thing.

  • I mean, you can disagree with the sentiment (I certainly do), but there are still an awful lot of people saying it.

    • I thought it was true too, for a couple of months. Then the honeymoon phase ended and now I only use Claude to write commit message drafts (which I rewrite myself) and review PRs.

Yes, today’s kids should instead learn to be influencers.

This is genuine advice I’ve seen from high profile business types. We’re fucked in the sense our children will be made to be attention whores online.

It seems out of step and foolish, and the cynic in me says that Anthropic has a side hustle of identity harvesting and is looking for justifications, but on the flip side, there is a real risk of pearl clutching if a child ever uses AI, and maybe Anthropic just wants to steer clear of all of that. Though simply putting it in the ToS should be sufficient legal shielding, and the idea that they're chat harvesting to age fingerprint conversations seems dubious.