Comment by troupo

2 months ago

> Opus 4.5 really is at a new tier however. It just...works.

Literally tried it yesterday. I didn't see a single difference with whatever model Claude Code was using two months ago. Same crippled context window. Same "I'll read 10 irrelevant lines from a file", same random changes etc.

42 comments

troupo

EMM_386 2 months ago

The context window isn't "crippled".

Create a markdown document of your task (or use CLAUDE.md), put it in "plan mode" which allows Claude to use tool calls to ask questions before it generates the plan.

When it finishes one part of the plan, have it create a another markdown document - "progress.md" or whatever with the whole plan and what is completed at that point.

Type /clear (no more context window), tell Claude to read the two documents.

Repeat until even a massive project is complete - with those 2 markdown documents and no context window issues.

troupo 2 months ago
> The context window isn't "crippled".
... Proceeds to explain how it's crippled and all the workarounds you have to do to make it less crippled.
- EMM_386 2 months ago
  
  > ... Proceeds to explain how it's crippled and all the workarounds you have to do to make it less crippled.
  No - that's not what I did.
  You don't need an extra-long context full of irrelevant tokens. Claude doesn't need to see the code it implemented 40 steps ago in a working method from Phase 1 if it is on Phase 3 and not using that method. It doesn't need reasoning traces for things it already "thought" through.
  This other information is cluttering, not helpful. It is making signal to noise ratio worse.
  If Claude needs to know something it did in Phase 1 for Phase 4 it will put a note on it in the living markdown document to simply find it again when it needs it.
  
  11 replies →

mikestorrent 2 months ago

200k+ tokens is a pretty big context window if you are feeding it the right context. Editors like Cursor are really good at indexing and curating context for you; perhaps it'd be worth trying something that does that better than Claude CLI does?

troupo 2 months ago
> a pretty big context window if you are feeding it the right context.
Yup. There's some magical "right context" that will fix all the problems. What is that right context? No idea, I guess I need to read a yet-another 20 000-word post describing magical incantations that you should or shouldn't do in the context.
The "Opus 4.5 is something else/nex tier/just works" claims in my mind means that I wouldn't need to babysit its every decision, or that it would actually read relevant lines from relevant files etc. Nope. Exact same behaviors as whatever the previous model was.
Oh, and that "200k tokens context window"? It's a lie. The quality quickly degrades as soon as Claude reaches somewhere around 50% of the context window. At 80+% it's nearly indistinguishable from a model from two years ago. (BTW, same for Codex/GPT with it's "1 million token window")
- theshrike79 2 months ago
  
  It's like working with humans:
  1) define problem 2) split problem into small independently verifiable tasks 3) implement tasks one by one, verify with tools
  With humans 1) is the spec, 2) is the Jira or whatever tasks
  With an LLM usually 1) is just a markdown file, 2) is a markdown checklist, Github issues (which Claude can use with the `gh` cli) and every loop of 3 gets a fresh context, maybe the spec from step 1 and the relevant task information from 2
  I haven't ran into context issues in a LONG time, and if I have it's usually been either intentional (it's a problem where compacting wont' hurt) or an error on my part.
  
  3 replies →
- CuriouslyC 2 months ago
  
  I realize your experience has been frustrating. I hope you see that every generation of model and harness is converting more hold-outs. We're still a few years from hard diminishing returns assuming capital keeps flowing (and that's without any major new architectures which are likely) so you should be able to see how this is going to play out.
  It's in your interest to deal with your frustration and figure out how you can leverage the new tools to stay relevant (to the degree that you want to).
  Regarding the context window, Claude needs thinking turned up for long context accuracy, it's quite forgetful without thinking.
  
  13 replies →
- mikestorrent 2 months ago
  
  > There's some magical "right context" that will fix all the problems.
  All I can tell you is that in my own lived experience, I've had some fantastic results from AI, and it comes from telling it "look at this thing here, ok, i want you to chain it to that, please consider this factor, don't forget that... blah blah blah" like how I would have spelled things out to a junior developer, and then it really does stand a really solid chance of turning out what I've asked for. It helps a lot that I know what to ask for; there's no replacing that with AI yet.
  So, your own situation must fall into one of these coarse buckets:
  - You're doing something way too hard for AI to have a chance at yet, like real science / engineering at the frontier, not just boring software or infra development
  - Your prompts aren't specific enough, you're not feeding it context, and you're expecting it to one-shot things perfectly instead of having to spend an afternoon prompting and correcting stuff
  - You're not actually using and getting better at the tools, so you're just shouting criticisms from the sidelines, perhaps as sour grape because you're not allowed by policy / company can't afford to have you get into it.
  IDK. I hope it's the first one and you're just doing Really Hard Things, but if you're doing normal software developer stuff and not seeing a productivity advantage, it's a fucking skill issue.

pluralmonad 2 months ago

I'm not familiar with any form of intelligence that does not suffer from a bloated context. If you want to try and improve your workflow, a good place to start is using sub-agents so individual task implementations do not fill up your top level agents context. I used to regularly have to compact and clear, but since using sub-agents for most direct tasks, I hardly do anymore.

troupo 2 months ago

1. It's a workaround for context limitations
2. It's the same workarounds we've been doing forever
3. It's indistinguishable from "clear context and re-feed the entire world of relevant info from scratch" we've had forever, just slightly more automated
That's why I don't understand all the "it's new tier" etc. It's all the same issues with all the same workarounds.

iwontberude 2 months ago

I use Sonnet and Opus all the time and the differences are almost negligible

llmslave2 2 months ago

That's because Opus has been out for almost 5 months now lol. Its the same model, so I think people have been vibe coding with a heavy dose of wine this holiday and are now convinced its the future.

Leynos 2 months ago

Opus 4.5 was released 24th November.
spaceman_2020 2 months ago
Looks like you hallucinated the Opus release date
Are you sure you're not an LLM?
- llmslave2 2 months ago
  
  Opus 4.1 was released in August or smth.