Comment by joenot443

7 months ago

Indeed. As I've been explaining this to my more non-techie friends, the interesting finding here isn't that an AI could do something we don't like, it's that it seems willing, in some cases, to _lie_ about it and actively cover its tracks.

I'm curious what Simon and other more learned folks than I make of this, I personally found the chat on pg 12 pretty jarring.

24 comments

joenot443

hattmall 7 months ago

At the core the AI is just taking random branches of guesses for what you are asking it. It's not surprising that it would lie and in some cases take branches that make it appear to be covering it's tracks. It's just randomly doing what it guesses humans would do. It's more interesting when it gives you correct information repeatedly.

F7F7F7 7 months ago
Is there a person on HackerNews that doesn’t understand this by now? We all collectively get it and accept it, LLMs are gigantic probability machines or something.
That’s not what people are arguing.
The point is, if given access to the mechanisms to do disastrous thing X, it will do it.
No one thinks that it can think in the human sense. Or that it feels.
Extreme example to make the point: if we created an API to launch nukes. Are yoh certain that something it interprets (tokenizes, whatever) is not going to convince it to utilize the API 2 times out of 100?
If we put an exploitable (documented, unpatched 0 day bug bug) safe guard in its way. Are you trusting that ME or YOU couldn’t talk it into attempting to access that document to exploit the bug, bypass the safeguard and access the API?
Again, no one thinks that it’s actually thinking. But today as I happily gave Claude write access to my GitHub account I realized how just one command misinterpreted command could go completely wrong without the appropriate measures.
Do I think Claude is sentient and thinking about how to destroy my repos? No.
- unclad5968 7 months ago
  
  I think the other guy is making the point that because they are probabalistic, they will always have some cases select the output that lies and covers it up. I don't think they're dismissing the paper based on the probabalistic nature of LLMs, but rather saying the outcome should be expected.
  
  5 replies →
- anon373839 7 months ago
  
  > if we created an API to launch nukes
  > today as I happily gave Claude write access to my GitHub account
  I would say: don’t do these things?
  
  11 replies →
- ta988 7 months ago
  
  Note that humans are also given test orders presented as real ones to see if they would act properly in a real life situation. That's part of ORIs https://www.512aw.afrc.af.mil/News/Article-Display/Article/1...
- ben_w 7 months ago
  
  > Again, no one thinks that it’s actually thinking
  I dunno, quite a lot of people are spending a lot of time arguing about what "thinking" means.
  Something something submarines swimming something.
arcticfox 7 months ago

> It's just randomly doing what it guesses humans would do.
Yes, but isn't the point that that is bad? Imagine an AI given some minor role that randomly abuses its power, or attempts to expand its role, because that's what some humans would do in the same situation. It's not surprising, but it is interesting to explore.

chrisandchris 7 months ago

Well if AI is about to replicate the human, it learned from the best.