← Back to context

Comment by sunaurus

20 hours ago

Has anybody else noticed a pretty significant shift in sentiment when discussing Claude/Codex with other engineers since even just a few months ago? Specifically because of the secret/hidden nature of these changes.

I keep getting the sense that people feel like they have no idea if they are getting the product that they originally paid for, or something much weaker, and this sentiment seems to be constantly spreading. Like when I hear Anthropic mentioned in the past few weeks, it's almost always in some negative context.

Well, off the top of my head:

- Banning OpenClaw users (within their rights, of course, but bad optics)

- Banning 3rd party harnesses in general (ditto)

(claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)

- Lowering reasoning effort (and then showing up here saying "we'll try to make sure the most valuable customers get the non-gimped experience" (paraphrasing slightly xD))

- Massively reduced usage (apparently a bug?) The other day I got 21x more usage spend on the same task for Claude vs Codex.

- Noticed a very sharp drop in response length in the Claude app. Asked Claude about it and it mentioned several things in the system prompt related to reduced reasoning effort, keeping responses as brief as possible, etc.

It's all circumstantial but everything points towards "desperately trying to cut costs".

I love Claude and I won't be switching any time soon (though with the usage limits I'm increasingly using Codex for coding), but it's getting hard to recommend it to friends lately. I told a friend "it was the best option, until about two weeks ago..." Now it's up in the air.

  • > It's all circumstantial but everything points towards "desperately trying to cut costs".

    I have been wondering if it's more geared at reducing resource usage, given that at the moment there's a known constraint on AI datacenter expansion capability. Perhaps they are struggling to meet demand?

    • It’s more that Anthropic knows that the models themselves are non-sticky, and the real moat is in the ecosystem around it.

      It only makes sense for them to get users to use their ecosystem, rather than other tools.

      2 replies →

    • > Perhaps Anthropic is struggling to meet demand?

      Yes, definitely, they’re gracefully failing to meet demand. They could also deny new customers, but it would probably be bad for business.

      4 replies →

    • I wish they would just rip the bandaid to stop everybody's entitled whining.

      "We're sorry, what we were able to give you for $100/mo before now needs to be $200/mo (or more). We miscalculated/we were too generous/gave too much away for too little. It's a new technology, we are seeing a ton of demand, we are trying to run a business, hope you understand. If you don't want it, don't pay for it."

      14 replies →

  • Huh I've noticed that! Opus hallucinates answers and provides contradictory reasoning to me much more regularly during the past couple of weeks than it did in February.

    For example, after a question about a script that used taskset and CUDA_VISIBLE_DEVICES to handle NUMA domains on a parallel GLUE evaluation run:

    >Round 2 tasks reuse GPU slots that Round 1 may still be occupying. Cola, SST-2, and QQP are small datasets and will likely finish before MNLI, but QNLI is sizable. If QNLI (GPU 3, slot 3) is still running when Round 2 launches, that's fine — they're on different slots. But more importantly, Round 2 uses slots 5/6/7 (GPUs 7/0/1), while Round 1 uses slots 0/1/2/3/4 (GPUs 4/5/2/3/6). So there's no GPU collision — you're good here actually. Disregard, this is clean.

  • Claude -p is allowed. They're not going to give you a feature then ban you for using it.

    What they changed is that it now uses extra usage, which is charged at api rates

    • "claude -p" does not charge api rates by itself, I just ran "claude -p 'write hello world to foo.txt'", and it didn't.

      What they changed is that if you have OpenClaw run 'claude -p' for you, that gets your account banned or charged API rates, and if they think your usage of 'claude -p' is maybe OpenClaw, even if it's not, you get charged API rates or banned.

      It seems so silly to me. They built a feature with one billing rate, and the feature is a bash command. If you have a bad program run the bash command, you get billed at a different rate, if you have a good script you wrote yourself run it, you're fine, but they have literally no legitimate way to tell the difference since either way it's just a command being run.

      The justification going around is that OpenClaw usage is so heavy that it impacts the service for other people, but like OpenClaw was just using the "claude code max" plan, so if they can't handle the usage the plan promises, they should be changing the plan.

      If they had instead said "Your claude code max plan, which has XX quota, will get charged API rates if you consistently use 50% of your quota. The quota is actually a lie, it's just the amount you can burst up to once or twice a week, but definitely not every day" and just banned everyone that used claude code a lot, I wouldn't be complaining as much, that'd be much more consistent.

    • It only switches to charging API rates if some part of your prompt triggers their magic string detector. Lot of examples of that floating around where swapping "is" for "are" or whatever will magically allow the request against your subscription plan again.

  • > (claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)

    How often? Realistically, if you invoke it occasionally, for what's clearly an amount that's "reasonable personal use", then no you don't get nuked.

    • It’s the same problem people have with Google. If they ban you for some AI hallucinated reason you have no recourse other than going viral on Hacker News.

      2 replies →

  • Anthropic has become shady as hell in less than a few weeks. The DoD Story and the overall popularity among developers got them a huge leap over OAI but i certainly won't renew my subscription with them. The Claude SDK feels like a constant fight against its own limitations compared to Codex and other Harnesses.

  • They also screwed up the API token detection and also blocked a bunch of 1st party tool users for ~24h.

    Support consisted of AI bots saying you did something stupid, you did something wrong, you were abusing the system, followed by (only when I asked for it explicitly) claiming to file a ticket with a human who will contact you later (and it either didn't happen or their ticket system is /dev/null).

    (By the way this is the 2nd time I've been "please hold" gaslit by support LLMs this exact same way, the other being with Square)

  • claude -p not working would be instant unsubscribe downgrade from Max to Pro and further drive my use of codex. I use both but overall have noticed I reach for Claude less than codex lately because claude keeps getting slower and slower (I have not noticed a drop off in quality, but I use it less and less so maybe I'm not in a good position to notice).

    Generally I find codex and claude make a good team. I'm not a heavy user, but I am currently Claude Max 5x and ChatGPT Plus. Now that OpenAI has a $100 offering and I am finding myself using Claude less, I am considering switching to Claude Pro and ChatGPT Pro x5. The work hours restriction on Claude Max x5 really pisses me off.

    I am not a heavy user. Historically I only break over 50% weekly one week a month and average about 30-40% of Max x5 over the entire month. I went Max because of the weekly limits and to access the better models and because I felt I was getting value. I need an occasional burst of usage, not 24/7 slow compute. But even for pay-as-you-go burst usage Anthropic's API prices are insane vs Max.

    I have yet to ever hit a limit on codex so it's not on my mind. And lately it seems like Claude is likely to be having a service interruption anyway. A big part of subscribing to Claude Max was to get away from how the usage limits on Pro were causing me to architect my life around 5hr windows. And now Anthropic has brought that all back with this don't use it before 2pm bullshit. I want things ready to go when the muses strike. I'm honestly questioning whether Anthropic wants anyone who isn't employed as a software engineer to use their kit.

    Anyway for the last month or so codex "just works" and Claude has been an invitation for annoyances. There was a time when codex was quite a bit behind claude-code. They have been roughly equal (different strength and weaknesses) since at least February (for me).

    • I might consider switching to codex from claude pro 20x but I need the post tool use, pre file write and post user message hooks. Waiting on codex to deliver.

      - pre file write -> block editing code files without a task and plan of work

      - post tool use -> show next open checkbox in the task to the agent, like an instruction pointer

      - post user message -> log all user messages for periodic review of intent alignment

      These 3 hooks + plain md files make my claude harness.

      4 replies →

  • Perhaps Anthropic should put a freeze on new signups until they can increase capacity. This is the best kind of problem for a business, I'm cheering for them.

  • > (claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)

    100% this, I’ve posted the same sentiment here on HN. I hate the chilling effect of the bans and the lack of clarity on what is and is not allowed.

    • In this case, they handled things pretty well. You can still use openclaw etc with your regular Anthropic subscription, it will just count towards your extra credits / usage which you can buy for a 30% discount compared to API pricing. And they gave everyone one month’s value in credits.

      I don’t think they could have done that much better I’d say.

      14 replies →

  • Why were third party harnesses banned? Surely they'd want sticking power over the ecosystem.

    • > Why were third party harnesses banned? Surely they'd want sticking power over the ecosystem.

      Third-party harnesses are the exact opposite of stickiness!

      Ditching Claude Code for a third party harness while using the Claude Code subscription means it's trivial to switch to a different model when you {run out of credits | find a cheaper token provider | find a better model}.

    • There’s the argument that Anthropic has built Claude Code to use the models efficiently, which the subscription pricing is based on.

      Maybe there’s some truth to that, but then why haven’t OpenAI made the same move? I believe the main reason is platform control. Anthropic can’t survive as a pipeline for tokens, they need to build and control a platform, which means aggressively locking out everybody else building a platform.

      14 replies →

    • Note that the thing that's banned is using third party harnesses with their subscription based pricing.

      If you're paying normal API prices they'll happily let you use whatever harness you want.

    • To be clear they weren’t banned from Claude usage, they were required to use the API and API rates rather than Claude Max tokens.

      Claude code uses a bunch if best practices to maximize cache hit rate. Third party harnesses are hit or miss, so often use a lot more tokens for the same task.

      7 replies →

    • One thing is lack of control of token efficiency on what’s already a subsidised product.

      Another thing is branding: Their CLI might be the best right now, but tech debt says it won’t continue to be for very long.

      By enforcing the CLI you enforce the brand value — you’re not just buying the engine.

      4 replies →

    • I want to differentiate 2 kinds of harnesses

      1. openclaw like - using the LLM endpoint on subscription billing, different prompts than claude code

      2. using claude cli with -p, in headless mode

      The second runs through their code and prompts, just calls claude in non-interactive mode for subtasks. I feel especially put off by restricting the second kind. I need it to run judge agents to review plans and code.

  • > claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked

    I've used it with a sub a lot. Concurrency of 40 writing descriptions of thousands of images, running for hours on sonnet.

    I have a lot of complaints. I've cancelled my $200 subscription and when it runs out in a few days I'll have to find something else.

    But claude -p is fine.

    ... Or it was 2 week ago. Who knows if they've silently throttled it by now?

    • The other day I read that letting another agent invoke claude -p was considered a violation (i.e. letting OpenClaw delegate to Claude Code).

      Not sure how that's enforced though. I was in OpenClaw discord a while ago and enforcement seemed a bit random.

      I'll try to find the source, I might have gotten the details mixed up.

      2 replies →

  • I will say I have noticed none of these things in my enterprise account. Is this is a known targeting of non-enterprise clients only?

  • I think we are about a month away from a class action lawsuit, at their revenue they are a juicy target. And god knows they got the entirely self inflicted unholy combination going on, marketing & sales that borders on fraud (X times the usage of plan Y which has Z times of free tier which has unknowable "magic tokens") and then of course the actual fraud, reducing usage in fifteen different non obvious non public ways.

  • i dont know why ppl are surprised. you just need to see what they say on china, open source and fake safety blogs to understand they re not a company that devs should give their code for free to

  • Most of those are issues are coming from a very small minority. A lot of times its good for businesses to focus on the customers that are driving them the highest margin, most likely not users like yourself.

    1) Nobody should expect to use OpenClaw without API usage.

    2) We have known for a long time that the plans are subsidized. It was not as big of a deal but now that demand has continued to explode at a multiple and tools like OpenClaw were creating a lot of usage from a small minority of customers, prices change.

    Everything for me points more towards, we have made a service people really want to use and we are trying to balance a supply shortage (compute) with pricing. Nothing is stopping folks like yourself from simply paying the API rates. It is the simple no hassle way to get around any issue you are having, pay the API cost and you will have no limitations!

A month ago the company I work at with over 400 engineers decided to cancel all IDE subscriptions (Visual Studio, JetBrains, Windsurf, etc.) and move everyone over to Claude Code as a "cost-saving measure" (along with firing a bunch of test engineers). There was no migration plan - the EVP of Technology just gave a demo showing 2 greenfield projects he'd built with Claude Opus over a weekend and told everyone to copy how he worked. A week later the EVP had to send out an email telling people to stop using Opus because they were burning through too many tokens.

Claude seems to be getting nerfed every week since we've switched. I wonder how our EVP is feeling now.

  • Pretty bad decision on his part. I've been telling other engineers within my company who felt threatened by AI that this would happen. That prices would rise and the marginal cost for changes to big codebases would start to exceed the cost of an engineer's salary. API credits are expensive, especially for huge contexts, and sometimes the model will use $200 in credits trying to solve a problem that could be fixed in an hour by a good engineer with enough context.

    It kind of reminds me of the joke where a plumber charges $500 for a 5 minute visit. When the client complains the plumber says it's $50 for labor and $450 for knowing how to fix the problem.

    • A good lesson for all - I always really liked the Picasso version:

      In a bustling restaurant, an excited patron recognized the famous artist Picasso dining alone. Seizing the moment, the patron approached Picasso with a simple request. With a plain napkin and a big smile, he asked the artist for a drawing. He promised payment for his troubles. Picasso, ever the creator, didn’t hesitate. From his pocket, he produced a charcoal pencil and he brought to life a stunning sketch of a goat on the napkin—a clear mark of his unique style. Proudly, he presented it to the patron.

      The artwork mesmerized the patron, who reached out to take it, only to be stopped by Picasso’s firm hand. “That will be $100,000,” Picasso declared.

      Astonished, the patron balked at the sum. “But it took you just a few seconds to draw this!”

      With a calm demeanor, Picasso took back the napkin, crumpled it, and tucked it away into his pocket, replying, “No, it has taken me a lifetime.”

      4 replies →

    • It seems very unlikely that prices would rise in the long term. Yes, RAM and GPU prices are suddenly going up due to the demand spike and OpenAI's shenanigans, but I doubt it's going to last very long. Some combination of new capacity and reduced demand will most likely put things back on the usual course where this stuff gradually gets cheaper over time. And models are getting better, so next year you can probably get the same results for less compute. That $200 in credits becomes $150, then $100, then....

    • >That prices would rise

      Competition will prevent that from happening. When anyone can host open models and there is giant demand for LLMs companies can not easily raise token prices without sending a lot of traffic to their competitors.

      2 replies →

    • > the model will use $200 in credits trying to solve a problem that could be fixed in an hour by a good engineer with enough context

      So the price for fixing the problem is equal. Sounds like a great argument for AI.

      4 replies →

    • Even if you take it as true that prices have risen recently, and may continue to rise as the VC subsidies dry up, they will fall again long-term. Inference will get more power efficient with model-on-chip solutions like Taalas and God willing we will get cheaper and cheaper renewable energy.

      Despite this I don't think engineers should feel threatened. As long as there is a need for a human in the loop, as today, there will still be engineering jobs. And if demand for engineering effort is elastic enough, there could easily be even more jobs tomorrow.

      Rather than threatened, I think engineers should feel exposed. To danger, yes, but opportunity as well.

      3 replies →

  • I can’t believe how many small to mid size companies are being destroyed by bad decisions like this.

    A friend’s company fired all EMs and have engineers reporting to product managers. They aren’t allowed to do refactors because the CTO believes the AI doesn’t need organized code.

  • He must be feeling pretty good, after all he still believes that it was the right call, and he definitely won't be admitting a mistake.

    There's 0 chance of him facing the consequences for it either.

  • Hopefully that EVP feels embarrassed that a big bet was made that not only didn't pay off but left the company in a worse position. Some schadenfreude may be all you can expect, since this is an executive.

  • lol. dude is so incompetent. changing tool for cost cutting is so stupid, we all know real cost cutting is firing people. if he is really good at he's doing, just fire 10% people and replace them with his Claude. If that didn't get backfired in 3 months, he will be CT0.

Just anecdotal, but I was using Claude Code for everything a few months ago, and it seemed great. Now, it is making a ton of mistakes, doing the wrong thing, misunderstanding context, and just generally being unusable.

I now have been using Codex and everything has been great (I still swap back and forth but generally to check things out.)

My theory is just that the models are great after release to get people switching, then they cut them back in capabilities slowly over time until the next major release to increase the hype cycle.

  • Is it the models themselves or the tools around them? There's that patch[1] that floats around for Claude Code that's supposed to solve a lot of these problems by adjusting its tool-level prompts. Also, if it were the models themselves, wouldn't Cursor users have the same complaints (do they? I haven't heard anything but the only Cursor users I talk to are coworkers)?

    I think it's more likely they're trying to optimize the Claude Code prompts to reduce load on their system and have overcorrected at the cost of quality.

    1: https://gist.github.com/roman01la/483d1db15043018096ac3babf5...

  • Yeah, shorter time frame but I've been noticing that too. Just the other day I was experimenting with some workflow stuff. "Do x and y and run tests and then merge into develop."

    Duly runs, and finishes. "All merged into develop".

    I do some other work, don't see any of this, double check myself, I'm working off of develop.

    "Hey, where is this work?"

    "It is in this branch and this worktree, as you would expect, you will need to merge into develop."

    "I'm confused, I asked you to do that and you said it was done."

    "You're right and I did say that but I didn't do it. Shall I do it now?"

    There's like this really weird balancing act between managing usage, but making people burn more tokens...

I certainly noticed a significant drop in reasoning power at some point after I subscribed to Claude. Since then I've applied all sorts of fixes that range from disabling adaptive thinking to maxing out thinking tokens to patching system prompts with an ad-hoc shell script from a gist. Even after all this, Opus will still sometimes go round and round in illogical circles, self-correcting constantly with the telltale "no wait" and undoing everything until it ends up right where it started with nothing to show for it after 100k tokens spent.

Whether it's due to bugs or actual malice, it's not a good look. I genuinely can't tell if it's buggy, if it's been intentionally degraded, if it's placebo or if it's all just an elaborate OpenAI psyop.

Yeah I’ve seen this too. It’s difficult for me to tell if the complaints are due to a legitimate undisclosed nerf of Claude, or whether it’s just the initial awe of Opus 4.6 fading and people increasingly noticing its mistakes.

  • Just one more anecdote:

    I'm on the enterprise team plan so a decent amount of usage.

    In March I could use Opus all day and it was getting great results.

    Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of "But wait, actually I need to do x" with slight variations of the same realisation.

    This is not the 'thinking effort' setting in claude code, I noticed this happening across multiple sessions with the same thinking effort settings, there was clearly some underlying change that was not published that made the model get stuck in thinking loops more for longer and more often without any escape hatch to stop and prompt the user for additional steering if it gets stuck.

    • Whenever I see Opus say “but wait, …”—which is all the time—I get a little bit closer toward throwing my computer out the window. Sometimes I just collapse the thinking section, cross my fingers, and wait for the answer. It’s too frustrating watching the thinking process.

      3 replies →

    • I'm also an enterprise user and this has been my experience exactly. Same asks, same code bases, same models, much worse results. Everyone on my team is expressing the same thing.

      Not only that, but the lack of transparency about what's happening, in clear and simple terms, directly from Anthropic is concerning.

      I've already told my org's higher ups that in the current situation we're not close to getting our money's worth with these models.

    • this timing matches my experience, enterprise plan, but using opus from vscode - finished a heavy refactor of a large C# codebase mid march, tried to do basically the same thing early april and couldn't

    • It's probably because you didn't specify "make no mistakes" /s

      In all seriousness though, I've observed the same thing with my own usage.

  • I think there's a much more nefarious reason that you're missing.

    It's pretty clear that OpenAI has consistently used bots on social networks to peddle their products. This could just be the next iteration, mass spreading lies about Anthropic to get people to flock back to their own products.

    That would explain why a lot of users in the comments of those posts are claiming that they don't see any changes to limits.

    • The trouble with that argument, though, is that it works the other way as well: how do I, a random internet citizen, know that you're not doing the same thing for Anthropic with this comment?

      (FWIW I have definitely noticed a cognitive decline with Claude / Opus 4.6 over the past month and a half or so, and unless I'm secretly working for them in my sleep, I'm definitely not an Anthropic employee.)

      3 replies →

    • Judging from the number of GitHub issues on Anthropic, shamelessly being dismissed as "fixed", I doubt openai needs the bots to tarnish that competitor.

I have read the HN articles and seen the grumbling from coworkers, but I haven't felt it myself. I am not really a one-shotter, though. I kind of think about how I would refactor / write something myself and walk Claude through that, and nitpick it at each step... and the recent changes haven't really bothered me there. Likely due to being new at it.

Sometimes Claude can be a little weird. I was asking it about some settings in Grafana. It gave me an answer that didn't work. I told it that. "Yeah, I didn't really check, I just guessed." Then I said, "please check" and it said "you should read the discussion forums and issue tracker". I said "YOU should read the discussion forms and issue tracker". It consumed 35k tokens and then told me the thing I wanted was a checkbox. It was! I am not sure this saved me time, Claude. I am not experienced enough to say that this is a deal breaker. While this is burned into my mind as an amusing anecdote, it doesn't ruin the service for me.

My coworkers have noticed a degradation and feel vindicated by some of the posts here that I link. A lot of them are using Cursor more now. I have not tried it yet because I kind of like the Claude flow and /effort max + "are you sure?" yield good results. For now. I'm always happy to switch if something is clearly better.

  • How exactly do you use Claude Code, in the browser? Claude Code? The Desktop App (which has a "Code" tab) or some other way? I feel like people who have issues with Claude / Anthropic are not conveying where they are struggling. I see people say they tried "Claude" and didn't like it, but the secret sauce is Claude Code. Claude Code is what most people enjoy using, even if we all wish they would open up the harness, because there's so many more improvements that could go into it.

    • Yeah, sorry. Claude Code in my case.

      I do use the browser version on occasion. I have no strong feelings one way or the other there. I like it better than Google search in many cases, but probably just search more often.

Codex is my favored coding agent for generic "I need an agent tasks." GPT-5.4 does a bit better with images compared to claude, and debugs a little bit better.

The UX of codex is exceptionally nice however.

There's still plenty of "leave my fellow multbillion corp alone" type ones,it means that corp can and should screw it's loving customer base harder.

  • The enshittification meme has been taken too seriously to the point where it is shoehorned into every single place possible.

    It is not in the interests for Anthropic to screw its customer base. Running a frontier lab comes with tradeoffs between training, inference and other areas.

    • It doesn't matter if it is in Anthropic's interest to screw its customer base, if their reported monthly revenue growth is accurate then it makes perfect sense why Claude would be getting dumber...

      Demand is way up and compute supply is extremely limited because data center buildouts can't keep up with demand.

      In the face of rising demand and insufficient compute their only practical options (other than refusing new business until demand can be met) are signicantly raising the price of tokens (and more tighly limiting subscription options) or doing behind the scenes inference optimizations that are likely to make the model dumber.

      It is very easy to believe that they took the route of inference optimizations that have reduced quality of the service and that that is where the perceived enshittification is coming from.

I was going to do a deep analysis on this, and then I noticed that Claude Code deleted all of my sessions before March 6.

So yeah... I'm not thrilled with that, because I had done a similar analysis in December and had plenty of logs to review.

The results I do have for the last month aren't great. If you're curious I did post the results on HN:

https://news.ycombinator.com/item?id=47679661

Anthropic seems to be playing the giant-tech-rent-capture game that all of the old guards have done for the past few years. We thought that the new age of AI might bring some fresh air into the mix, but I guess that optimism quickly faded.

The $20 a month plan still seems like a pretty good deal for me (intermittent coding and not doing it for income).

I dunno, I haven’t really felt gimped in the past few months. My last issue was somewhere after the holidays when the usage suddenly felt like it cratered, but quality has been consistent.

I switched off claude when they nerfed opus 4.5 in August 2025, since then codex has clearly produced better code with fewer bugs. Opus 4.6 was more a temporary de-nerf of 4.5 but did not materially improve. codex has now a proven track record of producing stable results while introducing far fewer bugs.

I saw a big hit to Claude’s intelligence w/ the 1M context window model and the change to adaptive reasoning (github issue linked elsewhere in this thread).

I’m pretty much using 90% Codex now, although since Claude is consistently faster at answering quick questions, I still keep it open for that and for code-reviewing codex/human work before commit.

On OpenRouter token consumption is up 5x since November 2025. If this is indicative of the industries growth then I can't fathom how we will not hit resource constraints.

Yes. Anthropic is burning much of the goodwill they built up in contrast to OAI, and I personally am taking it as a sign to limit dependencies. Luckily for me I am not at all dependent on frontier models, and it's increasingly apparent that nobody else is too.

It looks like the spreadsheet-touchers over at Anthropic won out over the brand leaders, which is too bad as good will can be a trench if you don't abuse your customers.

  • I think on HN we always underestimate how much momentum matters. Anthropic has so much clout and mindshare that even if they continue burning goodwill and everyone on HN ditches Claude Code and stops recommending it, they will still be revenue leader for years to come. Those enterprise contracts aren’t month-to-month.

I'd say weaker, tasks claude code was aceing before it now fails with the exact same prompts, taking several rounds before it works. I'm looking to jump ship.

> people feel like they have no idea if they are getting the product that they originally paid for

They do indeed get the product they originally paid for.

It's simply that they were suckers and didn't read the "fine" print of the product they bought.

The label says "more tokens than the lower tier".

Is it perhaps not a model problem but a Claude Code harness problem?

For instance on exe.dev VMs with Shelley agent/harness and Opus 4.5/4.6, I haven't noticed any deterioration.

Any similar feedback perhaps from Opencode / GH Copilot subscription-provided Opus models?

My working theory is that all models are approximately the same, and the variance in quality mostly depends on how long they think for.

So the trick is to always set to max, and then begin every task with “this is an extremely complex task, do not complete it without extensive deep thinking and research” or whatever.

You’re basically fighting a battle to make the model think more, against the defaults getting more and more nerfed to save costs.

  • My experience has been that this isn’t generally true, mainly because worse models pursue red herrings or get confused and stuck. a better model will get to the correct solution in fewer tokens, and my surface-level understanding of how RL works supports this.

it has been my go-to provider for things but i noticed extraordinarily high usage rate last month on a little side project i started so that i could learn about things that are interesting to me while helping my day to day responsibilities (creating an iceberg data lake from my existing parquet files). i used my month’s worth of corporate subscription allocated tokens in 3 days. never seen that before so now i’m a lot more apprehensive about getting into the weeds with claude but i’m also so much less impressed with the other available models for work in this domain.

At some point these AI companies need to pay the piper as it were and actually provide a return for their investors. Expect cost cutting attempts to continue unless backlash is great enough to pose an existential threat to these companies.

Its not just engineers, and its not just about the 3rd party/rate limiting stuff. I feel like the reasoning capabilities have deteriorated too for non-coding tasks.

It feels like I'm getting less and less for my money every day. A few weeks ago I was programming all week and never getting close to the limit, yesterday half my weekly limit went away in a day. Changing the limits mid-subscription is just theft.

Anthropic isn't your friend.

Phase 1: $200/mo prosumer engineer tool

Phase 2: AI layoffs / "it's just AI washing"

Phase 3: $20,000/mo limited release model "too dangerous" to use

Phase 4: Accelerated layoffs / two person teams. Rehiring of certain personnel at lower costs.

Phase 5: "Our new model can decompile and rewrite any commercial software. We just wrote a new kernel after looking at Linux (bye, bye GPL!) We also decompiled the latest Zelda game, ported the engine to Rust, and made a new game with it. Source code has no value. Even compiled and obfuscated code is a breeze to clone."

Phase 6: $100k/mo model that replicates entire engineering teams, only large companies can afford it. Ordinary users can't buy. More layoffs.

Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

Anothropic used to be cool before they started gating access. Limiting Claw/OpenCode was strike one. Mythos is strike two.

Y'all should have started hating on their ethics when they started complaining about being distilled. For training they conducted on materials they did not own.

We need open weights companies now more than ever. Too bad China seems to be giving up on the idea.

"You wouldn't distill an Opus."

  • Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.

    You will be backstabbed

    You will be squeezed for all they can.

    And you will be betrayed.

    > Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

    Thankfully none of them actually makes money and just runs on investment so there is a good chance bubble will drop and the price of PC equipment will... continue to rise as US gives up Taiwan to China

    • > Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.

      Anthropic is a private company but nevertheless, the sentiment is accurate and applies to all kinds of corporations.

      1 reply →

  • What I want to know is how did they make the only LLM that doesn't sound cringe?

    I think it has something to do with mode collapse (although Claude certainly has its own "tells"), but I'm not sure.

    It sounds trivial but even for Agentic, I found the writing style to be really important. When you give Claude a persona, it sounds like the thing. When you give GPT a persona, it sounds like GPT half-assedly pretending to be the thing.

    ---

    Some other interesting points about Anthropic's models. I don't know if any of these relate to my LLM style question, but seems worth mentioning:

    Claude models also use way less tokens for the same task (on ArtificialAnalysis, they are a clear outlier on this metric).

    And there's a much stronger common sense, subjectively. (Not sure if we have a good way to actually measure that, though.) It takes context and common sense into account, to a much greater degree.

    (Which ties in with their constitution. Understanding why things are wrong at a deeper level, rather than just surface level pattern matching.)

    Opus is great but it should be bigger. You notice the difference between Sonnet and Opus, but with heavy use you notice Opus's limitations, too.

  • Good read on the situation.

    It all boils down to a brilliant but extremely expensive technology. Both to build and to run.

    We've been sold a product with heavy subsidy. The idea (from Sam) scale out and see what happens.

    Those who care to read between the lines can see what's happening. A perfect storm of demand that attract VCs who can't understand they are the real customers. Once they understand that it will be too late.

    Regarding open weight models: eventually we will, as humanity, benefit from the astronomical capital poured into developing a technology ahead of its time. In a few years this and even more will run on edge.

    Written by open source developers, likely former openai and anthropic employees who got so much cash in the bank they don't need to worry about renting their knowledge.

  • What leads you to say China AI is giving up on open weights?

    I've been using GLM for over 6 months and pretty happy.

    • Why would any company release open weights once the investment money stops ?

      Releasing open weights have been basically a PR move, the moment those companies need to actually make money they will cut it out as that reduces their client base.

      They DO NOT want you to run AI. They want you to pay them to do it

      5 replies →

    • People keep repeating this without any real thought behind it because of the high profile resignations on the Qwen team. Meanwhile the Minimax team just released a new open weights version of their 229B model yesterday. So much for that narrative.

      The AI landscape in China is larger than just Qwen and Alibaba.

      10 replies →

  • > We need open weights companies now more than ever.

    If you're objective it to democratize AI, sure. But for those fed up with it and the devastating effects it's having on students, for example, can opt to actively avoid paying for products with AI (I say this as someone who uses it every day, guilty). At some point large companies will see that they're bleeding money for something that most people don't seem to want, and cancel those $100k/mo deals. I've already experienced one AI-developer-turned company crash and burn.

    Personally, I don't think this LLM-based AI generation will have any significant positive impacts. Time, energy (CO2) and money would have been far better spent elsewhere.

    • There's plenty of valuable use cases for being able to give natural language instructions to a tool and have it act on that input. I do however agree that the current hype and valuations far exceed the real value being offered.

      Like with the dot com bubble there will be a crash and then whatever shakes out of that will be the companies and products who invested in understanding the actual strengths and weaknesses of the tech, instead of just trying to slap an "AI" sticker on everything.

  • > End of the PC era, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.

    This one seems too far fetched. Training models is widespread. There will always be open weight models in some form, and if we assume there will be some advancements in architecture, I bet you could also run them on much leaner devices. Even today you can run models on Raspberry Pis. I don't see a reason this will stop being a thing, there will be plenty of ways to tinker.

    However, keep in mind the masses don't care about tinkering and never have. People want a ChatGPT experience, not a pytorch experience. In essence this is true for all tech products, not just AI.

I'm pretty sure this is an attempt by both companies to shape a reasonable finance story for their eventual IPO. They need to make this look a lot better than a pump and dump (raising on wild valuations then offloading onto public investors).

That's a seasonal phenomenon. You can save this comment and look back three to six months later. By the time people will be like "is it just me or ChatGPT has been so bad lately?"

If you don't believe me you can search HN posts about Codex/Claude six months ago.

I can't believe how quickly they went from riding high on anti-OpenAI sentiment post-DOD fiasco, to shooting themselves and all their users new and old in the foot.

The ideal time to make your product worse is probably not at the same point that all of your competitor's customers are looking. Anthropic really, really fucked up here.

And beyond that, there's a ton of people who are just regular 9-5 Claude CLI users with an enterprise subscription who are getting punished with a worse model at the same price just as if we were Claw users. This kind of thing does not make one feel warm and fuzzy. I feel like I just got a boot to the teeth.

  • The hypothesis that makes the most sense is not that they are idiots, but that they have no choice. They cannot meet the new demand. So they’ve quantized the model.

I think so, but more than that, the performance of those tools seems to be terribly degrading when they keep saying they have created some crap like AGI which we know is a lie.

And to me, this lie is mostly a fight to see who bites the biggest chunk of the war death machine.

Generally, across AI providers, I have come to interpret sudden degradation in existing capabilities as a signal that a new, more expensive, product tier is about to launch.

They broke my openclaw last week; I switched to “extra usage” and prepaid a grand for same.

A few days later it simply stopped working again, API authentication error. What must I do to have working, paid, premium service?

Screwing around with it today, it works 5x slower and times out all of the time. I'm paying more and getting waaaaay less. Why can't companies just raise prices like normal?

I measured it for my specific usecases and have cancelled my Anthropic subscription (the Max x20 Plan)

The past two weeks I've had code that was delivered and declared as done (it did pass tests) but failed in a review by Codex. This has looped to a painful extent. The code in question deals with concurrency issues so there's an acknowledgement that its tricker, but still, I expect more from Claude.