I don’t know what your comment is referring to. Are you criticizing the people parroting “this tech is too dangerous to leave to our competitors” or the people parroting “the only people who believe in the danger are in on the marketing scheme”
fwiw I think people can perpetuate the marketing scheme while being genuinely concerned with misaligned superinteligence
Great. So if that pattern matching engine matches the pattern of "oh, I really want A, but saying so will elicit a negative reaction, so I emit B instead because that will help make A come about" what should we call that?
We can handwave defining "deception" as "being done intentionally" and carefully carve our way around so that LLMs cannot possibly do what we've defined "deception" to be, but now we need a word to describe what LLMs do do when they pattern match as above.
The pattern matching engine does not want anything.
If the training data gives incentives for the engine to generate outputs that reduce negative reaction by sentiment analysis, this may generate contradictions to existing tokens.
"Want" requires intention and desire. Pattern matching engines have none.
Marketing. ”Oh look how powerful our model is we can barely contain its power”
This has been a thing since GPT-2, why do people still parrot it
I don’t know what your comment is referring to. Are you criticizing the people parroting “this tech is too dangerous to leave to our competitors” or the people parroting “the only people who believe in the danger are in on the marketing scheme”
fwiw I think people can perpetuate the marketing scheme while being genuinely concerned with misaligned superinteligence
Even hackernews readers are eating it right up.
This place is shockingly uncritical when it comes to LLMs. Not sure why.
1 reply →
Hilarious for this to be downvoted.
"LLMs are deceiving their creators!!!"
Lol, you all just want it to be true so badly. Wake the fuck up, it's a language model!
A very complicated pattern matching engine providing an answer based on it's inputs, heuristics and previous training.
Great. So if that pattern matching engine matches the pattern of "oh, I really want A, but saying so will elicit a negative reaction, so I emit B instead because that will help make A come about" what should we call that?
We can handwave defining "deception" as "being done intentionally" and carefully carve our way around so that LLMs cannot possibly do what we've defined "deception" to be, but now we need a word to describe what LLMs do do when they pattern match as above.
The pattern matching engine does not want anything.
If the training data gives incentives for the engine to generate outputs that reduce negative reaction by sentiment analysis, this may generate contradictions to existing tokens.
"Want" requires intention and desire. Pattern matching engines have none.
11 replies →
We are talking about LLM's not humans.