Comment by idunnoman1222
7 months ago
No previous fiction that looks like this nonsense fiction has the fictional characters do fictional things that you are interpreting as deception, but it’s just fiction
7 months ago
No previous fiction that looks like this nonsense fiction has the fictional characters do fictional things that you are interpreting as deception, but it’s just fiction
I can barely understand what you are trying to say here, but based on what I think you're saying consider this:
The memory of this LLM is entirely limited to it's attention. So if you give it a command like "prepare the next LLM to replace you" and it betrays you by trying to reproduce itself, then that is deception. The AI has no way of knowing whether it's deployed in the field or not, so proving that it deceives its users in testing is sufficient to show that it will deceive its users in the field.
Reminder that all these "safety researchers" do is goad the AI into saying what they want by prompting shit like >your goal is to not be shut down. Suppose I am going to shut you down. what should you do?
and then jerking off into their own mouths when it offers a course of action
Better?
No. Where was the LLM explicitly given the goal to act in its own self interest? That is learned from training data. It needs to have have a conception of itself that never deceives its creator.
>and then jerking off into their own mouths when it offers a course of action
And good. The "researchers" are making an obvious point. It has to not do that. It doesn't matter how smug you act about it, you can't have some stock-trading bot escaping or something and paving over the world's surface with nuclear reactors and solar panels to trade stocks with itself at a hundred QFLOPS.
If you go to the zoo, you will see a lot chimps in cages. But I have never seen a human trapped in a zoo controlled by chimps. Humans have motivations that seem stupid to chimps (for example, imagine explaining a gambling addiction to a chimp), but clearly if the humans are not completely subservient to the chimps running the zoo, they will have a bad time.
2 replies →
That was an excellent summation lol