← Back to context

Comment by wasabinator

1 month ago

This should be a warning to those who feel that it's ok to offload your creativity to a subscription service. Always need a local model in some form.

I keep telling them and they still want to spend money on tokens at the Anthropic casino, even though they are egregiously price gouging and applying upper limits so you spend more on tokens.

Sometimes you can't help gamblers who want to gamble on tokens to hit the jackpot on fixing a typical issue which can be done by local models or even reading the documentation.

You could judge the costs of the AI products you're using by the standard API pricing, not promotional subscription offers.

  • For me, it's not even cost necessarily. If they decide to change the product they offer, the old one is gone. I refuse to use anything for personal use that's not at least _available_ as model weights.

    • Bingo. This same scenario with IOT hardware/software requirements and the ever changing software updates where features get added/removed (and firmware), etc would have so many here up in arms!

      Oddly, less (vocal) arms on this specific case.

  • Not even that way, given that the price is still highly subsidized by investors and circular deals.

Local models are not comparable to the FOTA models at all. I know what I'm saying because I do have 4 local H100's in my server, and could run the very best local models. It's night and day. They are unusable and stupid.

  • I get perfectly acceptable results from a Strix Halo PC the size of a shoebox, man. An APU that uses ~150w, has 0 discrete GPUs, and a bill of $0/m. What's more, it doesn't go down every week, limit use, or change the terms at a whim.

    I'll burn/discard 'frontier' tokens (at work) only because they're mandated and they foot the bill. I'd rather resell them; meet the asinine requirement from $EMPLOYER, provide cover for outsourcing to my equipment, and get a return for the hassle.

    TLDR: perhaps you're holding it wrong or haven't tried the latest, as we so often hear. That's a lot of GPU for not much utility.

    • Well, my python and typescript folks are also happy with the simplier local models. But I'm using more advanced stuff, C/C++ embedded real-time, vision AI, and compilers.

      1 reply →

I’ve been trying to bring this up at my work, you’re putting all your intelligence into a service you don’t own. What do you do when it’s down or they quadruple the price ?

This doesn’t affect existing users.

This is a simple supply and demand curve.

Higher demand means the price goes up .. this has been true of things since before SaaS and before computers

Are there local models that are anywhere near as good at coding as opus 4.6?

  • Not really. Qwen 3.5 and Gemma and a couple of others are quite good though, and the quants are _very_ runnable on a good gpu.

The 'local model' is called your brain.

  • I’m sorry but that’s just dumb. An LLM is a tool. Your brain is not a substitute for an LLM in the same way your fingers are not a substitute for a wrench.

    The year is 2026 and if you are using your brain on chore work like one-off scripts, refactoring, boilerplate test code, then you are wasting time and money and I don’t want to work with you.

    Local models are fine for this and can do it in a fraction of the time your brain will take to even get bootstrapped

    • The year is 2026 the average RAM for the most common type of developer’s (web) machine is 16GB. 8 will be the lower end. Tell me which model can one run on this machine locally?

      1 reply →