Comment by Sevii

4 months ago

Are agents actually capable of answering why they did things? An LLM can review the previous context, add your question about why it did something, and then use next token prediction to generate an answer. But is that answer actually why the agent did what it did?

9 comments

Sevii

gas9S9zw3P9c 4 months ago

It depends. If you have an LLM that uses reasoning the explanation for why decisions are made can often be found in the reasoning token output. So if the agent later has access to that context it could see why a decision was made.

Kubuxu 4 months ago
Reasoning, in majority of cases, is pruned at each conversation turn.
- DonHopkins 4 months ago
  
  The cursor-mirror skill and cursor_mirror.py script lets you search through and inschpekt all of your chat histories, all of the thinking bubbles and prompts, all of the context assembly, all of the tool and mcp calls and parameters, and analyze what it did, even after cursor has summarized and pruned and "forgotten" it -- it's all still there in the chat log and sqlite databases.
  cursor-mirror skill and reverse engineered cursor schemas:
  https://github.com/SimHacker/moollm/tree/main/skills/cursor-...
  cursor_mirror.py:
  https://github.com/SimHacker/moollm/blob/main/skills/cursor-...
  The German Toilet of AI "The structure of the toilet reflects how a culture examines itself." — Slavoj Zizek German toilets have a shelf. You can inspect what you've produced before flushing. French toilets rush everything away immediately. American toilets sit ambivalently between. cursor-mirror is the German toilet of AI. Most AI systems are French toilets — thoughts disappear instantly, no inspection possible. cursor-mirror provides hermeneutic self-examination: the ability to interpret and understand your own outputs. What context was assembled? What reasoning happened in thinking blocks? What tools were called and why? What files were read, written, modified? This matters for: Debugging — Why did it do that? Learning — What patterns work? Trust — Is this skill behaving as declared? Optimization — What's eating my tokens? See: Skill Ecosystem for how cursor-mirror enables skill curation.
  ----
  >Žižek on toilets. Slavoj Žižek during an architecture congress in Pamplona, Spain.
  >The German toilets, the old kind -- now they are disappearing, but you still find them. It's the opposite. The hole is in front, so that when you produce excrement, they are displayed in the back, they don't disappear in water. This is the German ritual, you know? Use it every morning. Sniff, inspect your shits for traces of illness. It's high Hermeneutic. I think the original meaning of Hermeneutic may be this.
  https://en.wikipedia.org/wiki/Hermeneutics
  >Hermeneutics (/ˌhɜːrməˈnjuːtɪks/)[1] is the theory and methodology of interpretation, especially the interpretation of biblical texts, wisdom literature, and philosophical texts. Hermeneutics is more than interpretive principles or methods we resort to when immediate comprehension fails. Rather, hermeneutics is the art of understanding and of making oneself understood.
  ----
  Here's an example cursor-mirror analysis of an experiment with 23 runs with four agents playing several turns of Fluxx per run (1 run = 1 completion call), 1045+ events, 731 tool calls, 24 files created, 32 images generated, 24 custom Fluxx cards created:
  Cursor Mirror Analysis: Amsterdam Fluxx Championship -- Deep comprehensive scan of the entire FAFO tournament development:
  amsterdam-flux CURSOR-MIRROR-ANALYSIS.md:
  https://github.com/SimHacker/moollm/blob/main/skills/experim...
  amsterdam-flux simulation runs:
  2 replies →
kgeist 4 months ago
LLMs often already "know" the answer starting from the first output token and then emulate "reasoning" so that it appeared as if it came to the conclusion through logic. There's a bunch of papers on this topic. At least it used to be the case a few months ago, not sure about the current SOTA models.
- nrds 4 months ago
  
  Wait, that's not right, let me think through this more carefully...

bananapub 4 months ago

of course not, but it can often give a plausible answer, and it's possible that answer will actually happen to be correct - not because it did any - or is capable of any - introspection, but because it's token outputs in response to the question might semi-coincidentally be a token input that changes the future outputs in the same way.

Onavo 4 months ago

Well, the entire field of explainable AI has mostly thrown in the towel..