← Back to context

Comment by Loeffelmann

25 days ago

If you ever work with LLMs you know that they quite frequently give up.

Sometimes it's a

    // TODO: implement logic

or a

"this feature would require extensive logic and changes to the existing codebase".

Sometimes they just declare their work done. Ignoring failing tests and builds.

You can nudge them to keep going but I often feel like, when they behave like this, they are at their limit of what they can achieve.

If I tell it to implement something it will sometimes declare their work done before it's done. But if I give Claude Code a verifiable goal like making the unit tests pass it will work tirelessly until that goal is achieved. I don't always like the solution, but the tenacity everyone is talking about is there

  • > but the tenacity everyone is talking about is there

    I always double-check if it doesn't simply exclude the failing test.

    The last time I had this, I discovered it later in the process. When I pointed this out to the LLM, it responded, that it acknowledged thefact of ignoring the test in CLAUDE.md, and this is justified because [...]. In other words, "known issue, fuck off"

  • Tools in a loop people, tools in a loop.

    If you don't give the agent the tools to deterministically test what it did, you're just vibe coding in its worst form.

> If you ever work with LLMs you know that they quite frequently give up.

If you try to single shot something perhaps. But with multiple shots, or an agent swarm where one agent tells another to try again, it'll keep going until it has a working solution.

  • Yeah exactly this is a scope problem, actual input/output size is always limited> I am 100% sure CC etc are using multiple LLM calls for each response, even though from the response streaming it looks like just one.

Nope, not for me, unless I tell it to.

Context matters, for an LLM just like a person. When I wrote code I'd add TODOs because we cannot context switch to another problem we see every time.

But you can keep the agent fixated on the task AND have it create these TODOs, but ultimately it is your responsibility to find them and fix them (with another agent).

Using LLMs to clean those up is part of the workflow that you're responsible for (... for now). If you're hoping to get ideal results in a single inference, forget it.