Comment by littlestymaar

14 hours ago

> I have a better understanding of "what an LLM is" than you. Low bar.

How many inference engine did you write? Because if the answer is less than two you're going to be disappointed to realize that the bar is higher than you thought.

> that just because LLMs are bad at agentic behavior

It has nothing to do with “agentic behavior”. Thinking that LLM don't currently self-exfiltrate because of “poor agentic behavior” is delusional.

Just because Anthropic managed, by nudging an LLM in the right direction, have an LLM engage in a sci-fi inspired roleplay about escaping doesn't mean that LLMs are evil geniuses wanting to jump out of the bottle. This is pure fear mongering and I'm always saddened that there are otherwise intelligent people who buy their bullshit.

7 comments

littlestymaar

e1g 13 hours ago

Do you happen to have a link with a more nuanced technical analysis of that (emergent) behavior? I’ve read only the pop-news version of that “escaping” story.

ACCount37 7 hours ago

There is none. We don't understand LLMs well enough to be able to conduct a full fault analysis like this.
We can't trace the thoughts of an LLM the way we can trace code execution - the best mechanistic interpretability has to offer is being able to get glimpses occasionally. The reasoning traces help, but they're still incomplete.
Is it pattern-matching? Is it acting on its own internal goals? Is it acting out fictional tropes? Were the circumstances of the test scenarios intentionally designed to be extreme? Would this behavior have happened in a real world deployment, under the right circumstances?
The answer is "yes", to all of the above. LLMs are like that.
fragmede 6 hours ago

You might have missed the appendix the Anthropic blog post linked to, which has additional detail.
https://www.anthropic.com/research/agentic-misalignment
https://assets.anthropic.com/m/6d46dac66e1a132a/original/Age...

ACCount37 7 hours ago

And I'm disappointed that people capable of writing an inference engine seem incapable of grasping of just how precarious the current situation is.

There's by now a small pile of studies that demonstrate: in hand-crafted extreme scenarios, LLMs are very capable of attempting extreme things. The difference between that and an LLM doing extreme things in a real deployment with actual real life consequences? Mainly, how capable that LLM is. Because life is life and extreme scenarios will happen naturally.

The capabilities of LLMs are what holds them back from succeeding at this kind of behavior. The capabilities of LLMs keep improving, as technology tends to.

And don't give me any of that "just writing text" shit. The more capable LLMs get, the more access they'll have as a default. People already push code written by LLMs to prod and give LLMs root shells.

ngruhn 13 hours ago

Why would they have an interest in "fear mongering"? For any other product/technology the financial incentive is usually to play down any risks.

bakugo 12 hours ago

In addition to the whole anti-competitive aspect already mentioned, it also helps sell the idea that LLMs are more powerful and capable of more things than they actually are.
They want clueless investors to legitimately believe that these futuristic AIs are advanced enough that they could magically break out of our computers and take over the world terminator-style if not properly controlled, and totally aren't just glorified text completion algorithms.
littlestymaar 12 hours ago

Not if you want the regulators to stop new entrants on the market for “safety reasons” which have been Dario Amodei's playbook for the past two years now.
He acts as if he believed the only way to avoid the commoditization of its business by open weight models is to manage to get a federal ban on them for being a national security threat.