← Back to context

Comment by proxysna

2 days ago

Feels about right.

I've launched an internal demo of Claude Code and Deepseek on the same day and we burned through our monthly allowance for Claude in just over a week, with more than a half of that budget being spent in one day. With DS people are unable to go through that same amount of money in a month, not even close.

With that Claude feels like an expensive toy, while DS is a shovel, purely because developers do not feel like they are eating into a precious resource while using it. Also it does not feel like there is much of a difference in capability between Claude and DS-pro. DS-pro and flash do feel like sonnet/opus and haiku, but flash is still very-very capable.

I rage canceled Claude today.

After 2 weeks of Claude getting progressively worse and worse, today was the final straw.

I don't care if they have a phone app. The model is COMPLETE garbage after you subscribe long enough and they think they've "got you".

I can't code on my phone if the model literally moves in the wrong direction and does the opposite of what I tell it to. If I wanted to make my code worse, I'd just randomly commit garbage. I don't need a mobile app for that.

  • I've seen a lot of this sentiment over the previous six months from people on reddit. I have yet to experience this myself as a developer with over 20 years of experience.

    • As always, I think this happen more to vibe coder. They don't understand that bigger project means worse AI performance. On top of that Opus felt being nerfed at understanding prompt so if your spec is bad you won't get good result.

    • Opus 4.7 has been a real downgrade for me. I’m back to mid 2025 when I had to catch all the completely intermediary goals/assumptions the model is creating for itself

      5 replies →

    • I see a lot of the "4.7 is a downgrade" sentiment. 4.7 does (mostly) what you ask it to do. 4.6 does what it thinks it should do. As someone with 20 years writing my own code I want the former, but the loud contingent online wants the latter.

      When you're on a mature codebase with 500k+ lines of code, I haven't seen anything else be as effective as 4.7.

      1 reply →

    • It's the same phenomenon as when you learn a new vocabulary word you see it everywhere.

      People heard "Claude is nerfed" and now they see it everywhere, they notice failures a lot more than they would have otherwise.

      Doesn't matter that Claude is not, in fact, nerfed. Perception is powerful and most humans are not rational.

      18 replies →

    • What it does seem like is that they're tuning some knobs up and down or releasing new versions of models or system prompts that result in the model getting dumber and smarter in waves.

      Opus has been dumb this week.

      Claude was having a lot of capacity problems and downtime and then this week that has been much less obvious... and the model is dumber.

      It could also just be luck and my impressions are false... who knows.

    • It’s because it’s not true, there’s no evidence for it that passes the sniff test. No lab is “shipping a worse model once they’ve got you”. People have a bad few days and blame the model providers instead of stepping back to fix their workflow.

      1 reply →

  • All these tools have almost feature parity. The GitHub cli allows remote sessions and can run anthropic models anyway

  • When you say "code on your phone" ... you don't mean what I think you mean do you? Like, are you actually using your phone to make code commits?

    • Yes, you can do that with Claude Code.

      Tell it what to do.

      Commit, push to origin, review on GitHub.

      Tell it to make changes, amend the commit, push --force-with-lease.

      I'm attempting to make a memory safe language like Rust but with a substantially lower learning curve and added safety (but non-zero cost abstractions) fully with AI, almost entirely from my phone, commuting, getting coffee, walking the dog, between sets at the gym, replacing doom scrolling before bed and during lunch, etc.

      Mostly to test how much LLMs can actually scale development.

      Depending on how long it takes them to clean up some architectural slop in the MIR lowering phase, the results could either be very impressive or not.

      From a purely cost basis perspective, it's hard to argue they aren't killing it.

      But from a multiplier perspective, it's up in the air how great they are.

      It's proven to be a really nice experiment, because much of what I wanted to solve with a language is the problems inherent to LLM development.

      So at the self hosting phase, I get a great opportunity to see if the language can actually deliver on what I dream for.

      7 replies →

Considered Gemini?

  • Gemini got a big reduction in usage limits this week. There was backlash and they added 3x usage for Antigravity a day later but I haven't really tried it out to get a feel for it yet.

  • Google rug pulled Code Assist and Gemini CLI. They're moving everything to Antigravity and we would need to reinstall all our tooling, reconfigure any automations, and the mechanism to subscribe via GCP is much clunkier.

    This was all supposed to be worked out prior to Cloud Next, but it wasn't. Ironically, they mentioned Claude in a few of their presentations at next.

    And that was our solution. We are a big GCP customer but our whole team is on Claude now and much happier.

  • Google has burnt all of its goodwill in dev communities so no, I don't think Gemini is worth consideration.