← Back to context

Comment by ryankrage77

3 days ago

> Why are you continually ignoring my stop hooks?

Why are you asking the token predictor about the tokens it predicted? There's no internal thought process to dissect, an LLM has no more idea why it did or did not 'do' something, than the apple knows why it falls towards the earth.

Simple: You can ask a LLM and can get a good explanation for why it did something, that will help you avoid bad behavior next time.

Is that reasoning? Does it know? I might care about those questions in another context but here I don't have to. It simply works (not all the time, but increasingly so with better models in my experience.)

  • This assumes that the tokens it outputs are a good description of the tool's behavior. That's not necessarily true though. For example, the LLM may be trained such that a lot of its input data is "LLMs often hallucinate", so the LLM may be biased to say "I hallucinated that" even if there's some more structural issue.

    I think there's something here to consider, but it's sort of like assuming that the LLM has reasons for doing things when it only has weights for which tokens are produced - thats the sum of its reasoning.

    Maybe it's the case that LLM tokens to correlate to truth values or that this approach actually provides value but there's probably good reason to be skeptical, given that we'd need to posit some sort of causative function of "token outputs" to reasoning about prior behaviors.

  • Nah many times I ask Claude about its behavior, features etc and it either tells me to check the Anthropic web site or goes look for it in the web site itself (useless most of the time).

    • It can be damn near impossible to break them out of some loops once they've committed. Gotta trim the context back to before the behaviour started.

  • I have not ever found an explanation of an LLM behavior by that LLM to be reliable. Why does anyone bother? They are guessing. It’s like asking Manson why he kills.

> Why are you asking the token predictor about the tokens it predicted?

I am surprised with this response because it implies this is not an extremely valuable technique. I ask LLMs all the time why they did or output something and they will usually provide extremely useful information. They will help me find where in the prompting I had conflicting or underspecified requirements. The more complex the agent scenario, the more valuable the agent becomes in debugging itself.

Perhaps in this case the problem with hooks is part of the deterministic Claude Code source code, and not under the control of the LLM anyway. So it may not have been able to help.

  • > they will usually provide extremely useful information

    bold claim, they'll provide bunch of words for sure like in this particular tool's response

The hilarious thing is LLMs tend not to say "I don't know", so it might find a reason, but if it doesn't, it will just make shit up.

This is just goofy prompting.

I have good success when I ask the agent to help me debug the harness. "Help me debug why Claude Code is ignoring my hook".

You can treat the LLM's answers ass hypotheses about why it did what it did, and test those hypotheses. The hypotheses the LLM comes up with might be better than the ones you come up with, because the LLM has seen a lot more text than you have, and particularly has seen a lot more of its own outputs than you have (e.g. from training to use other instances of itself as subagents).

>Why are you asking the token predictor about the tokens it predicted?

In fairness, humans are quite bad at this as well. You can do years of therapy and discover that while you thought/told people that you did X because of Y, that you actually did X because of Z.

Most people don't actually understand why they do the things they do. I'm not entirely unconvinced that therapy isn't just something akin to filling your running context window in an attempt to understand why your neurons are weighted the way they are.

  • I think the use of 'most' and carte-blanch "things they do" to be overreaching. "Some things", and "some people" perhaps.

    Yet that has no relevance to an LLM, which is not a human, and does not think. You're basically calling a record playing birdsong, a bird, because one mimics the other.

  • Why are you comparing a machine to humans. They both clearly operate differently on a fundamental level.

    Would therapy work on an LLM?

Its context includes reasoning that you can’t see, so this is actually a reasonable thing to ask.

The behavior may well be due to a bug/ambiguity in the context presented to the LLM. Because we, as mere users, don't easily get to see the full context (and if we did, we might feel a little overwhelmed) asking the LLM about why it did what it did seems like a reasonable approach to surface such a bug. Or it might even turn out to be a hook configuration error on the user's part.

I can picture this comment at the 50th percentile on the midwit meme

On either side it says "I just ask the model why it did that"

This is odd.

When things like this surface, I try to see how I focused on the gap leading up to it, and trying to fix it, and hope I am not focusing on the gap and questioning can draw attention to it and reinforcing it. This means more attention is drawn to what is not wanted by questioning it, instead of being clear that the intention is to ensure in all cases, shapes and forms it no longer happens.

Instead, mention what you require, repeatedly, and also mention what you do not want ever to happen and it might be different.

That’s a bit strong. A coding agent doesn’t know, but they’re pretty good at debugging problems. It can speculate about possible fixes based on its context.

The model should show some facsimile of understanding that it should not ignore the stop hook, otherwise that is a regression. Does that wording make you happier?

  • They said it doesn’t “understand” anything with which to give a real answer, so there’s no point in asking. You said “yeah but it should at least emulate the words of something that understands, that way I can pay a nickel for some apology tokens.” That about right?

    • I mean at some point what difference does this make? We can split hairs about whether it 'really understands' the thing, and maybe that's an interesting side-topic to talk about on these forums, but the behavior and outputs of the model is what really matters to everyone else right?

      Maybe it doesn't 'understand' in the experiential, qualia way that a human does. Sure. But it's still a valid and useful simile to use with these models because they emulate something close enough to understanding; so much so now that when they stop doing it, that's the point of conversation, not the other way around.

      1 reply →

Because the LLM is in the execution environment and can report on configuration settings in said environment.

Incorrect. LLMs are good at solving problems. Even ones where they need to pull fluff from their own navel.