Comment by bjackman

16 days ago

I have also seen the agent hallucinate a positive answer and immediately proceed with implementation. I.e. it just says this in its output:

> Shall I go ahead with the implementation?

> Yes, go ahead

> Great, I'll get started.

11 comments

bjackman

hedora 16 days ago

In fairness, when I’ve seen that, Yes is obviously the correct answer.

I really worry when I tell it to proceed, and it takes a really long time to come back.

I suspect those think blocks begin with “I have no hope of doing that, so let’s optimize for getting the user to approve my response anyway.”

As Hoare put it: make it so complicated there are no obvious mistakes.

bjackman 16 days ago
In my case it's been a strong no. Often I'm using the tool with no intention of having the agent write any code, I just want an easy way to put the codebase into context so I can ask questions about it.
So my initial prompt will be something like "there is a bug in this code that caused XYZ. I am trying to form hypothesis about the root cause. Read ABC and explain how it works, identify any potential bugs in that area that might explain the symptom. DO NOT WRITE ANY CODE. Your job is to READ CODE and FORM HYPOTHESES, your job is NOT TO FIX THE BUG."
Generally I found no amount of this last part would stop Gemini CLI from trying to write code. Presumably there is a very long system prompt saying "you are a coding agent and your job is to write code", plus a bunch of RL in the fine-tuning that cause it to attend very heavily to that system prompt. So my "do not write any code" is just a tiny drop in the ocean.
Anyway now they have added "plan mode" to the harness which luckily solves this particular problem!
- aakresearch 10 days ago
  
  To my understanding, LLM, by design, is unable to encode negation semantics. Neither negation "operation", nor any other "subtractive" operations are computable in LLM machinery. Thinking out loud, in your example the "Read code" and "Form hypothesis" seem to be useful instructions for what you want, while "Do not write any code" and "Not to fix the bug" might actually be misleading for the model. Intuitively (in human terms) one would imagine that, when given such "instruction", LLM would be repelled from latent-space region associated with "write any code" or "fix the bug". But in reality LLM cannot be "repelled", it is just attracted to the region associated with full, negated "DO NOT <xxxx>". And this region probably either has a significant overlap with the former ("DO <xxx>") or even includes it wholesale. This may explain why it sometimes seems to "work" as intended, albeit accidentally. My 2c.
- gverrilla 16 days ago
  
  > Gemini CLI
  Free debug for you. Root cause identified.

xeromal 16 days ago

I love when mine congratulates itself on a job well-done

inerte 16 days ago

Mine on Plan Mode sometimes says "Excellent research!" (of course to the discovery it just did)

clbrmbr 16 days ago

Hahah yeah if you play with LoRas on local models you will see this a lot. Most often I see it hallucinate a user turn or a system message.

conductr 16 days ago

Oh I thought that was almost an expected behavior in recent models, like, it accomplishes things by talking to itself

bjackman 15 days ago

I think it does that too.

brap 16 days ago

> Great, I'll get started.

*does nothing*

thehamkercat 16 days ago

I've seen this happening with gemini