Comment by grey-area

19 hours ago

> Read that again. The agent itself enumerates the safety rules it was given and admits to violating every one. This is not me speculating about agent failure modes. This is the agent on the record, in writing.

Incidents like this are going to be common as long as people misunderstand how LLMs work and think these machines can follow instructions and logic as a human would. Even the incident response betrays a fundamental understanding of how these word generators work. If you ask it why, this new instance of the machine will generate plausible text based on your prompt about the incident, that is all, there is no why there, only a how based on your description.

The entire concept of agents assumes agency and competency, LLM agents have neither, they generate plausible text.

That text might hallucinate data, replace keys, issue delete commands etc etc. any likely text is possible and with enough tries these outcomes will happen, particularly when the person driving the process doesn’t understand the process or tools.

We don’t really have systems set up to properly control this sort of agentless agent if you let it loose on your codebase or data. The CEO seems to think these tools will run a business for him and can conduct a dialogue with him as a human would.

"I literally requested no screw ups, and this is a screw up"

I bet these people are bad at managing humans too.

  • Maybe - humans have agency, they understand actions / consequences.

    AI agents do not have agency(!), they have no understanding of consequences. They actually have no understanding. At all.

This is what I am seeing more and more of, both in tech online and in the minds of people around me. Despite peoples' innate curiosity of how LLMs work, they still don't understand at the end of the day that they are just models. Augmented with tools and more capable than ever, yes, but still a piece of math at the end of the day. To expect of it anything other than credible output is science fiction.

I have opposite view - LLMs have many similarities with humans. Human, especially poorly trained one, could have made the same mistake. Human after amnesia could have found similar reasons to that LLM.

While LLM generate "plausible text" humans just generate "plausible thoughts".

  • Just because it sounds coherent doesn’t mean it is. You can make up false equivalence for anything if you try hard enough: A sheet of plywood also has many similarities with humans (made from carbon, contain water, break when hit hard enough), but that doesn’t mean they are even remotely equal.

    • I didn't write they were equal. I wrote they are similar in many ways.

      Comparing LLM to humans make much more sense than comparing them to computer programs.

Humans also don't follow given rules. Or we wouldn't need jail. We wouldn't need any security. We wouldn't need even user accounts.

  • Humans are able to follow rules. If you tell someone "don't press the History Eraser Button", and they decide they agree with the rule, they won't press the button unless by accident. If they really believe in the importance of the rule, they will take measures to stop themselves from accidentally press it, and if they really believe in the importance, they'll take measures to stop anyone from pressing it at all.

    No matter how you insist to an LLM not to press the History Eraser Button, the mere fact that it's been mentioned raises the probability that it will press it.

  • I don’t mean that in a small way (ie sometimes they don’t follow rules), I mean it in the more important sense that they don’t have a sense of right or wrong and the instructions we give them are just more context, they are not hard constraints as most humans would see them.

    This leads to endless frustration as people try to use text to constrain what LLMs generate, it’s fundamentally not going to work because of how they function.