← Back to context

Comment by Terr_

1 day ago

> claimed AI coding tool Replit deleted a database despite his instructions not to change any code without permission.

Well... yeah, this is a totally expect-able failure route, because LLMs are just bullshitting document-generators.

When you say, "don't make changes", there isn't even an entity on the other end that can "agree." The fictional character doesn't really exist, and the ego-less author isn't as smart as the character seems.

In fairness humans are the same. If you tell a human, whatever you do, don't press the red button, there isnt a hardware switch that makes it impossible. And so many accidents stem from this.

But of course I agree the software-implemented constraints work better in humans.

  • There is no mathematical guarantee to a contract or task being done if that is what you mean.

    Yes, however first there is an understanding involved when the other operator is intelligent [1] secondly there are consequences which matter to living being which don’t apply to an agent. Humans needs to eat and take care of family, for which they need a job so have lot less freedom to disobey explicit commands and expect to do those things.

    Even if an agent becomes truly intelligent you cannot control it well if they do not have hunger, pain love or any number of motivation drive[2].

    ——

    Depending on the type of red button you can always design safe guards (human or agent) , we after all haven’t launched nuclear war heads either by mistake or by malicious actor (yet).[3]

    ——-

    [1] which humans are and however much the industry likes to think otherwise agents are not

    [2] Every pet owner with a pet that have limited food drive will tell you how hard it train their dog versus the ones that have one, even if they are an intelligent breed or specimen.

    [3] Yes we have come alarmingly close few times , but no one has actually pressed the red button so to speak .

    • > Humans needs to eat and take care of family, for which they need a job so have lot less freedom to disobey explicit commands and expect to do those things.

      While true, I think there's a different problem here.

      Humans are observed to have a wide range of willingness to follow orders: everything from fawning, cult membership, and The Charge of the Light Brigade on the one side; to oppositional defiant disorder on the other.

      AI safety and alignment work wants AI to be willing to stop and change its behaviour when ordered, because we expect it to be dangerously wrong a lot, because there's no good reason to believe we already already know how to make them correctly at this point. This has strong overlap with fawning behaviour, regardless of the internal mechanism of each.

      So it ends up like Homer in the cult episode, with Lisa saying "Watch yourself, Dad. You're the highly suggestible type." and him replying "Yes. I am the highly suggestible type" — And while this is a fictional example and you can't draw conclusions about real humans from that, does the AI know that it shouldn't draw that conclusion? Does it know if it's "in the real world" or does it "think" it's writing a script in which case the meme is more important than what humans actually do?

      > [1] which humans are and however much the industry likes to think otherwise agents are not

      I have spent the last ~ year trying to convince a customer support team in a different country that it's not OK to put my name on bills they post to a non-existent street. Actually it is quite a bit worse than that, but the full details will be boring.

      That said, I'm not sure if I'm even corresponding with humans or an AI, so this is weak evidence.

    • People press the red button all the time. People still commit crimes even though there are laws that result in consequences.

  • My point is that most such comparisons are already flawed because the "machine" people are referring-to is an illusion.

    It's like people are debating the cellulose-quality of playing cards, comparing cards in a TV broadcast of a (real) poker tournament versus the cards that show up through a magical spy window caused by solitaire.exe. The comparison is already nonsense because the latter set of cards has no cellulose, or any mass at all.

    Similarly, the recipient of your "now do X" command in an LLM chat doesn't really exist, so can't have source-code or variables or goals. The illusion may sometimes be useful (esp. for marketing and getting investor money), but software engineers can't afford to fall for it when trying to diagnose problems.

    The real "constraints" are that each remotely-generated append to a hidden document statistically fits what came before with a certain amount of wiggle-room. Maybe that means you see text about "HAL-9000" opening the pod bay doors, and maybe you don't, but the document-generator is the thing in charge.