Comment by phillipcarter
6 hours ago
Maybe it's because I spend a lot of time breaking up tasks beforehand to be highly specific and narrow, but I really don't run into issues like this at all.
A trivial example: whenever CC suggests doing more than one thing in a planning mode, just have it focus on each task and subtask separately, bounding each one by a commit. Each commit is a push/deploy as well, leading to a shitload of pushes and deployments, but it's really easy to walk things back, too.
I thought everybody does this.. having a model create anything that isn't highly focused only leads to technical debt. I have used models to create complex software, but I do architecture and code reviews, and they are very necessary.
Absolutely. Effective LLM-driven development means you need to adopt the persona of an intern manager with a big corpus of dev experience. Your job is to enforce effective work-plan design, call out corner cases, proactively resolve ambiguity, demand written specs and call out when they're not followed, understand what is and is not within the agent's ability for a single turn (which is evolving fast!), etc.
The use case that Anthropic pitches to its enterprise customers (my workplace is one) is that you pretty much tell CC what you want to do, then tell it generate a plan, then send it away to execute it. Legitimized vibe-coding, basically.
Of course they do say that you should review/test everything the tool creates, but in most contexts, it's sort of added as an afterthought.
> Maybe it's because I spend a lot of time breaking up tasks beforehand to be highly specific and narrow, but I really don't run into issues like this at all.
I'm looking at the ticket opened, and you can't really be claiming that someone who did such a methodical deep dive into the issue, and presented a ton of supporting context to understand the problem, and further patiently collected evidence for this... does not know how to prompt well.
Its not about prompting; its about planning and plan reviewing before implementing; I sometimes spend days iterating on specification alone, then creating an implementation roadmap and then finally iterating on the implementation plan before writing a single line of code. Just like any formal development pipeline.
I started doing this a while ago (months) precisely because of issues as described.
On the other hand,analyzing prompts and deviations isnt that complex.. just ask Claude :)
The methodical guy confused visible reasoning traces in the UI with reasoning tokens & used claude to hallucinate a report
Sure I can.
I noticed a regression in review quality. You can try and break the task all you want, when it's crunch time, it takes a file from Gemini's book and silently quits trying and gets all sycophantic.
I do the same but I often find that the subtasks are done in a very lazy way.