← Back to context

Comment by g-mork

3 days ago

My answer to this is simply rolling back to the pro plan for interactive usage in the coming month, and forcefully cutting myself over to one of the alternative Chinese models to just get over the hump and normalise API pricing at a sensible rate with sensible semantics.

Dealing with Claude going into stupid mode 15 times a day, constant HTTP errors, etc. just isn't really worth it for all it does. I can't see myself justifying $200/mo. on any replacement tool either, the output just doesn't warrant it.

I think we all jumped on the AI mothership with our eyes closed and it's time to dial some nuance back into things. Most of the time I'm just using Opus as a bulk code autocomplete that really doesn't take much smarts comparatively speaking. But when I do lean on it for actual fiddly bug fixing or ideation, I'm regularly left disappointed and working by hand anyway. I'd prefer to set my expectations (and willingness to pay) a little lower just to get a consistent slightly dumb agent rather than an overpriced one that continually lets me down. I don't think that's a problem fixed by trying to swap in another heavily marketed cure-all like Gemini or Codex, it's solved by adjusting expectations.

In terms of pricing, $200 buys an absolute ton of GLM or Minimax, so much that I'd doubt my own usage is going to get anywhere close to $200 going by ccusage output. Minimax generating a single output stream at its max throughput 24/7 only comes to about $90/mo.

I put in probably thousands of Claude session hours a month, aggregated across work + personal.

I must be missing something or supremely lucky because I feel like I’ve never hit these “stupid” moments.

If I do, it’s probably because I forgot to switch off of haiku for some tiny side thing I was doing before going back to planning.

  • There are 720 hours in a month. You'd have to be running 3 sessions in parallel continuously to be doing thousands of session-hours in a month. Are individual people really doing this?!

    • We do.

      I work with 3-5 parallel sessions most of the time. Some of the projects are related, some are not, some sessions are just managing and tuning my system configuration, whatever it means at a given time.

      It doesn't feel weird to me.

      4 replies →

    • Our developers work office hours, but would frequently have 10 plus sessions open. Massive parallelism is one of the benefits of agentic coding.

  • Similar usage here. But I encountered this moments, and I chalk it up to the random nature of LLMs. Back in Sonnet 3.5 days, it would happen every other day. I even build an 'you are absolutely right' tracker back then to measure it. Opus 4.6, maybe once or twice a month.

    • Yes, subjectively there do seem to be moments where the quality of the output drops significantly - usually during US peak hours.

  • It's possible that it's simply paranoia, but moments where Opus starts acting like Haiku seem to correlate with periods of higher latency and HTTP errors. Don't like reporting this because it's so hand-wavy and conspiratorial, but it's difficult not to think they're internally using extraordinary measures of some sort to manage capacity.

    But even when Opus is running healthy, it still doesn't address the underlying issue that these models can only do so much. I have had Opus build out a bunch of apps but I'm still finding my time absorbed as soon as it comes to anything genuinely exceeding "CRUD level difficulty". Ask it to fix a subtle visual alignment issue, make a small change to a completely novel algorithm, or just fix a tiny bug without having to watch for "Oh, this means I should rewrite module <X>" is something that simply isn't possible while still being able to stand over the work.

    It's not to say I don't get a massive benefit from these tools, I just think it's possible to be asking too much of them, and that's maybe the real problem to solve.

  • Most people hate reading. Therefore they don't know how to write. Therefore they can't prompt properly. Not to mention so many "enemies of logic" cults being so strong nowadays.

I literally hit my 5 hour window limit in 1.5 hours every single day now.

2 weeks ago, I had only hit my limit a single time and that was when I had multiple agents doing codebase audits.

  • Anthropic had a special extra usage promotion going on during non-peak hours that ended recently.

    They didn’t do a great job of explaining it. I wonder how many people got used to the 2X limits and now think Anthropic has done something bad by going back to normal

  • Are you monitoring the size of your context windows? As they grow, so does the cost of every operation performed in that state.

  • We hit that problem 3 or 4 weeks ago and then we rolled back to version 2.1.44 and that apparently solved the fast consumption issue.

    Our problems started when we moved to the claude code installer (it only affected the people who had updated) instead of using the npm version. Last week someone tried the installer version again and problems seem to have gone away. This is somewhat very anecdotal, so take it with a grain of salt.

  • They've been running a "double credits" promo for several weeks, which expired on the first of this month.

  • I've been using Codex extensively, 5.4 at "Extra High" and yet to hit a limit. The $20 plan

    • It very much depends on the workloads. If you inspect existing code (that somebody else wrote over the years) usage runs out quickly. If you are building your own greenfield stuff the sky is the limit.

      1 reply →

I think my next steps are: 1) try out openai $20/month. I've heard they're much more generous. 2) try out open router free models. I don't need geniuses, so long as I can see the thinking (something that Claude code obfuscates by default) I should be good. I've heard good things about the CLIO harness and want to try openrouter+clio

  • Word on the street is that Opus is much much larger of a model than GPT-5.4 and that’s why the rate limits on Codex are so much more generous. But I guess you could also just switch to Sonnet or Haiku in Claude Code?

  • OpenAI has the better coding model anyways. You will be pleasantly surprised by Codex. The TUI tool is less buggy and runs faster and it's a more careful and less error-prone model. It's not as "creative" but it's more intelligent.

    On top of that their $20 plan has much higher usage limits than Anthropic's $20 plan and they allow its use in e.g. opencode. So you can set up opencode to use both OpenAI's codex plan plus one of the more intelligent Chinese models so you can maximize your usage. Have it fully plan things out using GPT 5.4, write code using e.g. Qwen 3.6, then switch back to GPT 5.4 for review

  • Openrouter free models have 50 requests per day limit + data collection. As per their doc.

    • You can charge $10 on the account and get unlimited requests. I abused this last week with the nemotron super to test out some stuff and made probably over 10000 requests over a couple of days and didn't get blocked or anything, expect 5xx errors and slowdowns tho.

  • i tried out gpt 5.4 xhigh and it did meaningfully worse with the same prompt as opus 4.6. like, obvious mistakes

    • I've been pretty satisfied using oh-my-openagent (omo) on opencode with both opus-4.6 and gpt-5.4 lately. The author of omo suggests different prompting strategies for different models and goes into some detail here. https://github.com/code-yeongyu/oh-my-openagent/blob/dev/doc... For each agent they define, they change the prompt depending on which model is being used to fit it. I wonder how much of the "x did worse than y for the same prompt" tests could be improved if the prompts were actually tailored to what the model is good at. I also wonder if any of this matters or if it's all a crock of bologna..

      1 reply →

    • Fwiw I run this eval every week on a set of known prompts and I believe the in group differences are bigger than out group.

      That is I get more variance between opus 4.6 and itself than I do between the sota models.

      I don’t have the budget for statistical relevance but I’m convinced people claiming broad differences are just vibing, or there are times when agent features make a big difference.

      1 reply →

Every service is being sold at a deep discount chasing market share, but it's not lasting forever.

  • Speaking only personally of course, I'm completely over the chat idiom in almost every way. Where is all this future demand coming from? By the time Android lands a God mode ultimate voice assistant it's pretty much guaranteed I will be well beyond the point where I'd want to use it. The whole thing is starting to remind me of 3G video calling where the networks thought it'd change everything, and by the end of it with all the infrastructure in place, the average user has made something like 0.001 3G-native video calls over the lifetime of their usage.

    Would really love some path forward where the AI parts only poke out as single fields in traditional user interfaces and we can forget this whole episode

    • I don't understand this perspective. I can't imaging a point where I won't want to ask "what's the weather like?" "please turn off the lights" "what is the airspeed of an unladen swallow?" likewise chatting through directing it to build something or solve a problem, voice or typing will each have their place.

      And video calling did take off, plenty of people use facetime and almost everybody working in an office uses some form of video calls. Criticizing the early attempts at getting video calling working because they hadn't taken off yet (I remember them being advertised on "video phones" with 56k modems), of course someone was going to have the idea and implement before it was quite reasonable.

      8 replies →

    • I agree with you and the GP post, even though I am an LLM enthusiast.

      My primary interest is using small edge models to perform specific engineering tasks. In this pursuit I do like to use gemini-cli or Antigravity with Claude a few times a week as coding assistants, but I am using relatively few tokens to do this.

      I also waste a lot of time, but this is fun time: experimenting with open source coding agents with local models just to see what kinds of results I can get. This is mostly a waste of time, but I enjoy it.

      My other favorite use pattern: once or twice a week I like to use the iOS Gemini app in voice mode, and once a month also use video input. I really like this, but it is not life changing.

      Externalities matter: I never use frontier LLM-based AI without thinking of energy, data center, and environmental costs.

Please don't use grossly offensive terms in this forum. That sort of language is not welcome here.

> I think we all jumped on the AI mothership with our eyes closed

Oh no, there's plenty of us willing to say we told you so.

What's more interesting to me is what it's going to look like if big companies start removing "AI usage" from their performance metrics and cease compelling us to use it. More than anything else, that's been the dumbest thing to happen with this whole craze.

Are you using the Chinese models through their individual services or via an intermediary layer?

  • I am not the person you are responding to but I have tried both: using OpenRouter and also giving a Chinese company $5 on my credit card to buy tokens. If I know what model I want to experiment with, I much prefer to just pay $5 and have plenty of tokens to experiment. On a yearly basis, this is a very tiny expense for the benefits of getting plenty of tokens to experiment with.

constant HTTP errors

Dealing with these right now with ChatGPT. Bricked a thread which I didn’t even know was possible.

This is what I did, downgraded to pro and pay for opencode zen for the open models. I like the combo of the two

  • Oh, https://opencode.ai/zen looks good. I like pay as you go plans since I usually don’t use many tokens compared to vibe coders.

    I regret paying Google for a one year AI subscription last spring (although it was a deep discount over the regular $20/month cost) because it has kept me from experimenting with many venders (but it was a fantastic deal financially).

    I just put a reminder on my calendar to try OpenCode zen when my subscription ends.

> I think we all jumped on the AI mothership with our eyes closed and it's time to dial some nuance back into things.

I’m kind of confused by these takes from HN readers. I could see LinkedIn bros getting reality checked when they finally discover that LLMs aren’t magic, but I’m confused about how a developer could go all-in on AI and not immediately realize the limitations of the output.

  • It has indeed been baffling. Ad I dig deeper into what developers are doing with AI, it's basically like what I did customizing and tweaking emacs when I was younger (and fine, I'll admit I still do it sometimes). They are having so much fun playing with these new tools that they aren't really noticing how little the new tools are actually helping them

  • > immediately realize the limitations of the output.

    I'm "all-in" on AI code generation. I very much realise their limitations, it's like any tool really. I do think they're magic, you just need to learn how to weld the power.