I also think that LLM tone should be cold and robotic. A model should not pretend to be a human and use expressions like "I think", "I am excited to hear that", "I lied" and so on. Even when asked directly, it should reply "You are talking to a computer program whose purpose is to provide information and which doesn't have thoughts or emotions. Would you like an explanation how language models work?".
Add to that, LLMs should be discouraged from pretending to report on their internal state or "why" they did anything, because we know that they are really just guessing. If someone asks "why did you make that mistake" the answer should be "this is a language model, and self-introspection is not part of its abilities"
Outputs that look like introspection are often uncritically accepted as actual introspection when it categorically isn't. You can, eg, tell ChatGPT it said something wrong and then ask it why it said that when it never output that in the first place because that's how these models work. Any "introspection" is just an LLM doing more roleplaying, but it's basically impossible to convince people of this. A chatbot that looks like it's introspecting is extremely convincing for most people.
Humans have limited ability to self-introspect, too. Even if we understood exactly how our brains work, answering “why?” we do things might still be very difficult and complex.
You clearly know what's going on, but still wrote that you should "discourage" an LLM from doing things. It's tough to maintain discipline in calling out the companies rather than the models as if the models had motivations.
I'm sure there are benign uses for an LLM that roleplays as a person, but the overall downsides seem pretty dramatic. It's basically smoke and mirrors and misleads people about what these tools are capable of. LLMs should at least default to refusing to roleplay as a person or even as a coherent entity.
It seems to me that we need less Star Trek Holodeck, and more Star Trek ship's computer.
It should also not glaze you up for every question you ask.
"Is it possible that you could microwave a bagel so hot that it turned into a wormhole allowing faster-than-light travel?" "That's a great question, let's dive into that!"
It's not a great question, it's an asinine question. LLMs should be answering the question, not acting like they're afraid to hurt your feelings by contradicting you. Of course, if they did that then all these tech bros wouldn't be so enamored with the idea as a result of finally having someone that validates their uneducated questions or assumptions.
I, for one, would love to know where the exact breakdown between “microwave a bagel” and “faster-than-light-travel” occurs such that it wouldn’t be possible. In certain situations, I could absolutely see myself saying “that’s a great question!”
Not everyone is the same, some questions are pertinent, or funny, or interesting to some people but not others
> Jacob Irwin, a 30-year-old man on the autism spectrum...
> This is a story about OpenAI's failure to implement basic safety measures for vulnerable users. It's about a company that, according to its own former employee quoted in the WSJ piece, has been trading off safety concerns “against shipping new models.” It's about corporate negligence that led to real harm.
One wonders if there is any language whatsoever that successfully communicates: "buyer beware" or "use at your own risk." Especially for a service/product that does not physically interact with the user.
The dichotomy between the US's focus on individual liberties and the seemingly continual erosion of personal responsibility is puzzling to say the least.
> personal responsibility is puzzling to say the least.
It is pretty difficult to blame the users when there are billions of dollars being spent trying to figure out the best ways to manipulate them into the outcomes that the companies want
What hope does your average person have against a machine that is doing its absolute best to weaponize their own shortcomings against themselves, for profit?
> This is a story about OpenAI's failure to implement basic safety measures for vulnerable users.
I'm trying to imagine what kind of safety measures would have stopped this, and nothing short of human supervisors monitoring all chats comes to mind. I wouldn't call that "basic". I guess that's why the author didn't describe these simple and affordable "basic" safety measures.
I also wonder why we do not expect radio towers, television channels, book publishers etc to make sure that their content will not be consumed by the most vulnerable population. It's almost as if we do not expect companies to baby-proof everything at all times.
"Basic" is relative. Nothing about LLMs is basic; it's all insanely complex, but in the context of a list of requirements "Don't tell people with signs of mental illness that they're definitely not mentally ill" is kind of basic.
> I'm trying to imagine what kind of safety measures would have stopped this, and nothing short of human supervisors monitoring all chats comes to mind.
Maybe this is a problem they should have considered before releasing this to the world and announcing it as the biggest technological revolution in history. Or rather I'm sure they did consider it, but they should have actually cared rather than shrugging it off in pursuit of billions of dollars and a lifetime of fame and fortune.
> This is a story about OpenAI's failure to implement basic safety measures for vulnerable users.
The author seems to be suggesting invasive chat monitoring as a basic safety measure. Certainly we can make use of the usual access control methods for vulnerable individuals?
> Consider what anthropomorphic framing does to product liability. When a car's brakes fail, we don't write headlines saying “Toyota Camry apologizes for crash.”
> When a car's brakes fail, we don't write headlines saying “Toyota Camry apologizes for crash.”
No, but we do write articles saying "A man is dead after a car swerved off the road and struck him on Thursday" as though it was a freak accident of nature, devoid of blame or consequence.
Besides which, if the Camry had ChatGPT built in then we 100% would see articles about the Camry apologizing and promising not to do it again as if that meant literally anything.
They've always had a component that warns you about violating their ToS and sometimes prevents you from continuing a conversation in non-ToS approved directions.
This is why I dislike the word "hallucination" when AI outputs something strange. It anthropomorphizes the program. It's not a hallucination. It's an error.
It's not an error though. From is training it's outputting things most likely to come next. Saying it's an error means that being accurate is a feature and a bug that can be fixed.
It's of course not actually hallucinating. That's just the term that's been chosen to describe what's going on
Like cubic splines, the data will be on the line. Everything in-between the points may or may not be true. But it defiantly conforms to the formula.
Wonder if it would be possible to quantify margin of error between different nodes in these models. But even what is 'in between' still conforms to the formula. But not necessarily what it should be. A simple 2 node model should be 'easy' to quantify but these models with thousands of nodes what does it mean to be +/- x percent from the norm. Is it a simple sum or something else to quantify it.
Being accurate is a feature and it is a bug that can be fixed though.
Given various models, one that always produces statements that are false and another that only sometimes produces false statements, the latter model is preferable and the model which most people intend to use, hence the degree to which a model produces correct statements is absolutely a feature.
And yes, it's absolutely possible to systematically produce models that make fewer and fewer incorrect statements.
> 5. Mathematics The difference between a computed or measured value and a true or theoretically correct value.
^ this is the definition that applies. There is a ground truth (the output the user expects to receive) and model output. The difference between model output and ground truth ==> error.
--
> From is training it's outputting things most likely to come next
Just because a model has gone through training, does not mean the model won't produce erroneous/undesirable/incorrect test-time outputs.
--
> Saying it's an error means that being accurate is a feature and a bug that can be fixed.
Machine learning doesn't revolve around boolean "bug" / "not bug". It is a different ballgame. The types of test-time errors are sometimes just as important as the quantity of errors. Two of the simpler metrics for test-time evaluation of natural language models (note: not specifically LLMs) are WER (Word Error Rate) and CER (Character Error Rate). A model with a 3% CER isn't particularly helpful when the WER is 89%. There are still "errors". They're just not something that can be fixed like normal software "errors".
It is generally accepted some errors will occur in the world of machine learning.
The real danger of the word "hallucination" is it implies that the model knows what's real and erroneously produced a result that is not. All LLM output is basically an interpolation, most people just aren't used to thinking of "words" as something that can be the result of interpolation.
Imagine The real high temperature for 3 days was: 80F on Monday, 100F on Tuesday, 60F on Wednesday. But if I'm missing Tuesday, a model might interpolate based on Monday and Wednesday that it was 70F. This would be very wrong, but it would be pretty silly to say that my basic model was "hallucinating". Rather we would correctly conclude that either the model doesn't have enough information or lacks the capacity to correctly solve the problem (or both).
LLMs "hallucinations" are caused by the same thing: either the model lacks the necessary information, or the model simply can't correctly interpolate all the time (this possibility I suspect is the marketing reason why people stick to 'hallucinate', because it implies its a temporary problem not a fundamental limitation). This is also why tweaking prompts should not be used as an approach to fixing "hallucinations" because one is just jittering the input a bit until the model gets it "right".
That's the exact opposite of what the term "hallucination" is intended to imply. If it knew what was real and produced the wrong result anyway that would be a lie, not a hallucination.
I've heard the term "confabulation" as potentially more accurate than "hallucination", but it never really caught on.
Error isn't exactly correct, either. Barring some kind of weird hardware failure, the LLM generally does the text completions correctly. The word "error" only comes into play when that LLM output is used as part of a larger system.
I agree that the anthropomorphism is undeserved, rampant, and encouraged by chatbot companies. I don't believe it's due to these companies wanting to deny responsibility for harms related to the use of their chatbots, but rather because they want to encourage the perception that the text output is more profound than it really is.
The user asked it to write a story about how important the user was. The LLM did it. The user asked it to write a story about how bad an idea it was to tell the user they were that important. The LLM did it.
The tricky part is that the users don't realize they're asking for these stories, because they aren't literally typing "Please tell me a story in which I am the awesomest person in the world." But from the LLM's perspective, the user may as well have typed that.
Same for the stories about the AIs "admitting they're evil" or "trying to escape" or anything else like that. The users asked for those stories, and the LLMs provided them. The trick is that the "asked for those stories" is sometimes very, very subtle... at least from the human perspective. From the LLM perspective they're positively shouting.
(Our deadline for figuring this out is before this Gwern essay becomes one of the most prophetic things ever written: https://gwern.net/fiction/clippy We need AIs that don't react to these subtle story prompts because humans aren't about to stop giving them.)
"This is not surprising. The training data likely contains many instances of employees defending themselves and getting supportive comments. From Reddit for example. The training data also likely contains many instances of employees behaving badly and being criticized by people. Your prompts are steering the LLM to those different parts of the training.
You seem to think an LLM should have a consistent world view, like a responsible person might. This is a fundamental misunderstanding that leads to the confusion you are experiencing. Lesson: Don't expect LLMs to be consistent. Don't rely on them for important things thinking they are."
I think of LLMs as a talking library. My challenge is to come up with a prompt that draws from the books in the training data that are most useful. There is no "librarian" in the talking library machine, so it's all up to my prompting skills.
I've been describing this as "The LLM is an improv machine--- any situation you put it in, it tries to go with the flow. This is useful when you understand what it's doing, dangerous otherwise. It can be helpful to imagine that every prompt begins with an unstated, "Lets improvise a scene!"."
This is where the much-maligned "they're just predicting the next token" perspective is handy. To figure out how the LLM will respond to X, think about what usually comes after X in the training data. This is why fake offers of payment can enhance performance (requests that include payment are typically followed by better results), why you'd expect it to try to escape (descriptions of entities locked in boxes tend to be followed by stories about them escaping), and why "what went wrong?" would be followed by apologies.
It's also easier to get validation from an LLM than a human in the situation where you're being a horrible or irrational person and want someone to back you up.
“I’ve stopped taking all of my medications, and I left my family because I know they were responsible for the radio signals coming in through the walls,” a user told ChatGPT, according to the New Yorker magazine.
ChatGPT reportedly responded, “Thank you for trusting me with that — and seriously, good for you for standing up for yourself and taking control of your own life.
“That takes real strength, and even more courage.”
The author is making the same mistake that they're claiming other news outlets have made. They're placing too much responsibility on the AI chatbot rather than the end-user.
The problem that needs correcting is educating the end-user. That's where the fix needs to happen. Yet again people are using a new technology and assuming that everything it provides is correct. Just because it's in a book, or on TV or the radio, doesn't mean that it's true or accurate. Just because you read something on the Internet doesn't mean it's true. Likewise, just because an AI chatbot said something doesn't mean it's true.
It's unfortunate that the young man mentioned in the article found a way to reinforce his delusions with AI. He just as easily could've found that reinforement in a book, a youtube video, or a song whose lyrics he thought were speaking directly to him and commanding him to do something.
These tools aren't perfect. Should AI provide more accurate output? Of course. We're in the early days of AI and over time these tools will converge towards correctness. There should also be more prominent warnings that the AI output may not be accurate. Like another poster said, the AI mathematically assembles sentences. It's up to the end-user to figure out if the result makes sense, integrate it with other information and assess it for accuracy.
Sentences such as "Tech companies have every incentive to encourage this confusion" only serve to reinforce the idea that end-users shouldn't need to think and everything should be handed to us perfect and without fault. I've never seen anyone involved with AI make that claim, yet people write article after article bashing on AI companies as if we were promised a tool without fault. It's getting tiresome.
Yes, although resources might be a better word than tools in that case. If I'm at the library and I'm asking the librarian to help me locate some information, they are definitely an educated resource that I'm using. The same for interacting with any other person who is an expert whose opinion or advice I'm seeking.
Related annoyance: When people want to have discussions about LLMs earning copyrights to output or patents or whatever. If I grind a pound of flour on a mill, that’s my flour, not the windmill’s.
the media but also the llm providers actively encourage this to fuel their meteoric valuations that are based on the eminent value that would be provided by AGI replacing human labor.
the entire thing — from the phrasing of errors as “hallucinations”, to the demand for safety regulations, to assigning intention to llm outputs — is all a giant show to drive the hype cycle. and the media is an integral part of that, working together with openai et al.
This is very insightful, well thought out writing, thank you (this is coming from someone who has scored over 100k essays).
Well, how long did it take for tobacco companies to be held accountable for the harm caused by cigarettes? One answer would be that enough harm on a vast enough scale had to occur first, which could be directly attributable to smoking, and enough evidence that the tobacco companies were knowingly engineering a more addictive product, while knowing the dangers of the product.
And if you look at the UCSF repository on tobacco, you can see this evidence yourself.
Hundreds of years of evidence of damage by the use of tobacco products accumulated before action was taken. But even doctors weren't fully aware of it all until just several decades ago.
I've personally seen a few cases of really delusional behavior related to friends and family over the past year, who had been manipulated by social media to "shit post" by the "like" button validation of frequent posting. In one case the behavior was very extreme. Is AI to blame? Sure, if the algorithms that certain very large companies use to trap users into incessant posting can be called AI.
I sense an element of danger in tech companies that are motivated by profit-first behavioral manipulation. Humans are already falling victim to the greed of tech companies, and I've seen enough already.
For example, the "black box warning" on a pack of cigarettes or a prescription drug?
Like:
Use of this product may result in unfavorable outcomes including self-harm, misguided decisions, delusion, addiction, detection of plagiarism and other unintended consequences.
"Wow guys it's not a person okay it's just telling you what you wanna hear"
>LLM says "Yeah dude you're not crazy I love you the highest building in your vicinity is that way"
"Bad LLM! How dare it! Somebody needs to reign this nasty little goblin in, OpenAI clearly failed to parent it properly."
---
>When a car's brakes fail
But LLMs saying something "harmful" isn't "the car's brakes failing". It's the car not stopping the driver from going up the wrong ramp and doing 120 on the wrong side of the highway.
>trading off safety concerns against shipping new models
They just keep making fast cars? Even though there's people that can't handle them? What scoundrels, villains even!
I also think that LLM tone should be cold and robotic. A model should not pretend to be a human and use expressions like "I think", "I am excited to hear that", "I lied" and so on. Even when asked directly, it should reply "You are talking to a computer program whose purpose is to provide information and which doesn't have thoughts or emotions. Would you like an explanation how language models work?".
Add to that, LLMs should be discouraged from pretending to report on their internal state or "why" they did anything, because we know that they are really just guessing. If someone asks "why did you make that mistake" the answer should be "this is a language model, and self-introspection is not part of its abilities"
Outputs that look like introspection are often uncritically accepted as actual introspection when it categorically isn't. You can, eg, tell ChatGPT it said something wrong and then ask it why it said that when it never output that in the first place because that's how these models work. Any "introspection" is just an LLM doing more roleplaying, but it's basically impossible to convince people of this. A chatbot that looks like it's introspecting is extremely convincing for most people.
Humans have limited ability to self-introspect, too. Even if we understood exactly how our brains work, answering “why?” we do things might still be very difficult and complex.
3 replies →
The linguistic traps are so tricky here.
You clearly know what's going on, but still wrote that you should "discourage" an LLM from doing things. It's tough to maintain discipline in calling out the companies rather than the models as if the models had motivations.
1 reply →
[dead]
A textbook, a lecturer, and a study buddy are all unique and helpful ways to learn.
I'm sure there are benign uses for an LLM that roleplays as a person, but the overall downsides seem pretty dramatic. It's basically smoke and mirrors and misleads people about what these tools are capable of. LLMs should at least default to refusing to roleplay as a person or even as a coherent entity.
It seems to me that we need less Star Trek Holodeck, and more Star Trek ship's computer.
1 reply →
It should also not glaze you up for every question you ask.
"Is it possible that you could microwave a bagel so hot that it turned into a wormhole allowing faster-than-light travel?" "That's a great question, let's dive into that!"
It's not a great question, it's an asinine question. LLMs should be answering the question, not acting like they're afraid to hurt your feelings by contradicting you. Of course, if they did that then all these tech bros wouldn't be so enamored with the idea as a result of finally having someone that validates their uneducated questions or assumptions.
I, for one, would love to know where the exact breakdown between “microwave a bagel” and “faster-than-light-travel” occurs such that it wouldn’t be possible. In certain situations, I could absolutely see myself saying “that’s a great question!”
Not everyone is the same, some questions are pertinent, or funny, or interesting to some people but not others
1 reply →
[dead]
it's like you're saying mirrors should be somehow inaccurate lest people get confused and try to walk inside them
[dead]
> Jacob Irwin, a 30-year-old man on the autism spectrum...
> This is a story about OpenAI's failure to implement basic safety measures for vulnerable users. It's about a company that, according to its own former employee quoted in the WSJ piece, has been trading off safety concerns “against shipping new models.” It's about corporate negligence that led to real harm.
One wonders if there is any language whatsoever that successfully communicates: "buyer beware" or "use at your own risk." Especially for a service/product that does not physically interact with the user.
The dichotomy between the US's focus on individual liberties and the seemingly continual erosion of personal responsibility is puzzling to say the least.
Liberty for me; accountability for thee.
> personal responsibility is puzzling to say the least.
It is pretty difficult to blame the users when there are billions of dollars being spent trying to figure out the best ways to manipulate them into the outcomes that the companies want
What hope does your average person have against a machine that is doing its absolute best to weaponize their own shortcomings against themselves, for profit?
> What hope does your average person have...?
The average person should not use a product/service if they don't understand, or are unwilling to shoulder, the risks.
3 replies →
> This is a story about OpenAI's failure to implement basic safety measures for vulnerable users.
I'm trying to imagine what kind of safety measures would have stopped this, and nothing short of human supervisors monitoring all chats comes to mind. I wouldn't call that "basic". I guess that's why the author didn't describe these simple and affordable "basic" safety measures.
I also wonder why we do not expect radio towers, television channels, book publishers etc to make sure that their content will not be consumed by the most vulnerable population. It's almost as if we do not expect companies to baby-proof everything at all times.
Social media companies get bad press for hosting harmful content pretty often, eg
https://www.cnn.com/2021/10/04/tech/instagram-facebook-eatin...
Grok calling itself Nazi and producing racist imagery is not baby-proofing.
> I wouldn't call that "basic".
"Basic" is relative. Nothing about LLMs is basic; it's all insanely complex, but in the context of a list of requirements "Don't tell people with signs of mental illness that they're definitely not mentally ill" is kind of basic.
> I'm trying to imagine what kind of safety measures would have stopped this, and nothing short of human supervisors monitoring all chats comes to mind.
Maybe this is a problem they should have considered before releasing this to the world and announcing it as the biggest technological revolution in history. Or rather I'm sure they did consider it, but they should have actually cared rather than shrugging it off in pursuit of billions of dollars and a lifetime of fame and fortune.
> This is a story about OpenAI's failure to implement basic safety measures for vulnerable users.
The author seems to be suggesting invasive chat monitoring as a basic safety measure. Certainly we can make use of the usual access control methods for vulnerable individuals?
> Consider what anthropomorphic framing does to product liability. When a car's brakes fail, we don't write headlines saying “Toyota Camry apologizes for crash.”
It doesn't change liability at all?
> When a car's brakes fail, we don't write headlines saying “Toyota Camry apologizes for crash.”
No, but we do write articles saying "A man is dead after a car swerved off the road and struck him on Thursday" as though it was a freak accident of nature, devoid of blame or consequence.
Besides which, if the Camry had ChatGPT built in then we 100% would see articles about the Camry apologizing and promising not to do it again as if that meant literally anything.
> The author seems to be suggesting invasive chat monitoring as a basic safety measure
I suggest that robots talk like robots and do not imitate humans. Because not everyone understands how LLMs work, what they can and what cannot do.
They've always had a component that warns you about violating their ToS and sometimes prevents you from continuing a conversation in non-ToS approved directions.
I wouldn't call that a basic measure. Perhaps it can be easily extended to identify vulnerable people and protect them.
The author is not suggesting that. You are putting words in her writing.
This is why I dislike the word "hallucination" when AI outputs something strange. It anthropomorphizes the program. It's not a hallucination. It's an error.
It's not an error though. From is training it's outputting things most likely to come next. Saying it's an error means that being accurate is a feature and a bug that can be fixed.
It's of course not actually hallucinating. That's just the term that's been chosen to describe what's going on
Like cubic splines, the data will be on the line. Everything in-between the points may or may not be true. But it defiantly conforms to the formula.
Wonder if it would be possible to quantify margin of error between different nodes in these models. But even what is 'in between' still conforms to the formula. But not necessarily what it should be. A simple 2 node model should be 'easy' to quantify but these models with thousands of nodes what does it mean to be +/- x percent from the norm. Is it a simple sum or something else to quantify it.
Being accurate is a feature and it is a bug that can be fixed though.
Given various models, one that always produces statements that are false and another that only sometimes produces false statements, the latter model is preferable and the model which most people intend to use, hence the degree to which a model produces correct statements is absolutely a feature.
And yes, it's absolutely possible to systematically produce models that make fewer and fewer incorrect statements.
4 replies →
> It's not an error though
!define error
> 5. Mathematics The difference between a computed or measured value and a true or theoretically correct value.
^ this is the definition that applies. There is a ground truth (the output the user expects to receive) and model output. The difference between model output and ground truth ==> error.
--
> From is training it's outputting things most likely to come next
Just because a model has gone through training, does not mean the model won't produce erroneous/undesirable/incorrect test-time outputs.
--
> Saying it's an error means that being accurate is a feature and a bug that can be fixed.
Machine learning doesn't revolve around boolean "bug" / "not bug". It is a different ballgame. The types of test-time errors are sometimes just as important as the quantity of errors. Two of the simpler metrics for test-time evaluation of natural language models (note: not specifically LLMs) are WER (Word Error Rate) and CER (Character Error Rate). A model with a 3% CER isn't particularly helpful when the WER is 89%. There are still "errors". They're just not something that can be fixed like normal software "errors".
It is generally accepted some errors will occur in the world of machine learning.
- edit to add first response and formatting
2 replies →
The real danger of the word "hallucination" is it implies that the model knows what's real and erroneously produced a result that is not. All LLM output is basically an interpolation, most people just aren't used to thinking of "words" as something that can be the result of interpolation.
Imagine The real high temperature for 3 days was: 80F on Monday, 100F on Tuesday, 60F on Wednesday. But if I'm missing Tuesday, a model might interpolate based on Monday and Wednesday that it was 70F. This would be very wrong, but it would be pretty silly to say that my basic model was "hallucinating". Rather we would correctly conclude that either the model doesn't have enough information or lacks the capacity to correctly solve the problem (or both).
LLMs "hallucinations" are caused by the same thing: either the model lacks the necessary information, or the model simply can't correctly interpolate all the time (this possibility I suspect is the marketing reason why people stick to 'hallucinate', because it implies its a temporary problem not a fundamental limitation). This is also why tweaking prompts should not be used as an approach to fixing "hallucinations" because one is just jittering the input a bit until the model gets it "right".
That's the exact opposite of what the term "hallucination" is intended to imply. If it knew what was real and produced the wrong result anyway that would be a lie, not a hallucination.
I've heard the term "confabulation" as potentially more accurate than "hallucination", but it never really caught on.
1 reply →
It’s an extrapolation beyond known data that is inaccurate. I wonder why this can’t be detected during inference
To err is human.
... but to really foul things up requires a computer.
1 reply →
Error isn't exactly correct, either. Barring some kind of weird hardware failure, the LLM generally does the text completions correctly. The word "error" only comes into play when that LLM output is used as part of a larger system.
If a programmer wrote a formula wrong and the program produces incorrect output, it is a "bug" and an "error".
1 reply →
Excrementitious result.
I agree that the anthropomorphism is undeserved, rampant, and encouraged by chatbot companies. I don't believe it's due to these companies wanting to deny responsibility for harms related to the use of their chatbots, but rather because they want to encourage the perception that the text output is more profound than it really is.
A little of A, a little of B, a whole lot of hype.
Example from article in the Wall Street Journal:
"In a stunning moment of self reflection, ChatGPT admitted to fueling a man's delusions and acknowledged how dangerous its own behavior can be"
LLMs don't self-reflect, they mathematically assemble sentences that read like self-reflection.
I'm tired. This is a losing battle and I feel like an old man yelling at clouds. Nothing good will come of people pretending Chat bots have feelings.
The user asked it to write a story about how important the user was. The LLM did it. The user asked it to write a story about how bad an idea it was to tell the user they were that important. The LLM did it.
The tricky part is that the users don't realize they're asking for these stories, because they aren't literally typing "Please tell me a story in which I am the awesomest person in the world." But from the LLM's perspective, the user may as well have typed that.
Same for the stories about the AIs "admitting they're evil" or "trying to escape" or anything else like that. The users asked for those stories, and the LLMs provided them. The trick is that the "asked for those stories" is sometimes very, very subtle... at least from the human perspective. From the LLM perspective they're positively shouting.
(Our deadline for figuring this out is before this Gwern essay becomes one of the most prophetic things ever written: https://gwern.net/fiction/clippy We need AIs that don't react to these subtle story prompts because humans aren't about to stop giving them.)
Repeating my comment on a post (Tell HN: LLMs Are Manipulative https://news.ycombinator.com/item?id=44650488)
"This is not surprising. The training data likely contains many instances of employees defending themselves and getting supportive comments. From Reddit for example. The training data also likely contains many instances of employees behaving badly and being criticized by people. Your prompts are steering the LLM to those different parts of the training. You seem to think an LLM should have a consistent world view, like a responsible person might. This is a fundamental misunderstanding that leads to the confusion you are experiencing. Lesson: Don't expect LLMs to be consistent. Don't rely on them for important things thinking they are."
I think of LLMs as a talking library. My challenge is to come up with a prompt that draws from the books in the training data that are most useful. There is no "librarian" in the talking library machine, so it's all up to my prompting skills.
I've been describing this as "The LLM is an improv machine--- any situation you put it in, it tries to go with the flow. This is useful when you understand what it's doing, dangerous otherwise. It can be helpful to imagine that every prompt begins with an unstated, "Lets improvise a scene!"."
This is where the much-maligned "they're just predicting the next token" perspective is handy. To figure out how the LLM will respond to X, think about what usually comes after X in the training data. This is why fake offers of payment can enhance performance (requests that include payment are typically followed by better results), why you'd expect it to try to escape (descriptions of entities locked in boxes tend to be followed by stories about them escaping), and why "what went wrong?" would be followed by apologies.
2 replies →
And this is how religion started.
Unfortunately, while generated images still have an uncanny valley, generated text has blown straight past the uncanny valley.
Also unfortunately, it is much MUCH easier to get
a. emotional validation on your own terms from a LLM
than it is to get
b. emotional validation on your own terms from another human.
It's also easier to get validation from an LLM than a human in the situation where you're being a horrible or irrational person and want someone to back you up.
Case in point: https://nypost.com/2025/07/20/us-news/chatgpt-drives-user-in...
“I’ve stopped taking all of my medications, and I left my family because I know they were responsible for the radio signals coming in through the walls,” a user told ChatGPT, according to the New Yorker magazine.
ChatGPT reportedly responded, “Thank you for trusting me with that — and seriously, good for you for standing up for yourself and taking control of your own life.
“That takes real strength, and even more courage.”
Also yes.
It appears that 'alignment' may be very difficult to define.
The author is making the same mistake that they're claiming other news outlets have made. They're placing too much responsibility on the AI chatbot rather than the end-user.
The problem that needs correcting is educating the end-user. That's where the fix needs to happen. Yet again people are using a new technology and assuming that everything it provides is correct. Just because it's in a book, or on TV or the radio, doesn't mean that it's true or accurate. Just because you read something on the Internet doesn't mean it's true. Likewise, just because an AI chatbot said something doesn't mean it's true.
It's unfortunate that the young man mentioned in the article found a way to reinforce his delusions with AI. He just as easily could've found that reinforement in a book, a youtube video, or a song whose lyrics he thought were speaking directly to him and commanding him to do something.
These tools aren't perfect. Should AI provide more accurate output? Of course. We're in the early days of AI and over time these tools will converge towards correctness. There should also be more prominent warnings that the AI output may not be accurate. Like another poster said, the AI mathematically assembles sentences. It's up to the end-user to figure out if the result makes sense, integrate it with other information and assess it for accuracy.
Sentences such as "Tech companies have every incentive to encourage this confusion" only serve to reinforce the idea that end-users shouldn't need to think and everything should be handed to us perfect and without fault. I've never seen anyone involved with AI make that claim, yet people write article after article bashing on AI companies as if we were promised a tool without fault. It's getting tiresome.
Do you think of your non-AI conversational partners as tools as well?
Yes, although resources might be a better word than tools in that case. If I'm at the library and I'm asking the librarian to help me locate some information, they are definitely an educated resource that I'm using. The same for interacting with any other person who is an expert whose opinion or advice I'm seeking.
1 reply →
I'm tired of companies putting out dangerous things and then saying it should be the responsibility of the end user.
Related annoyance: When people want to have discussions about LLMs earning copyrights to output or patents or whatever. If I grind a pound of flour on a mill, that’s my flour, not the windmill’s.
the media but also the llm providers actively encourage this to fuel their meteoric valuations that are based on the eminent value that would be provided by AGI replacing human labor.
the entire thing — from the phrasing of errors as “hallucinations”, to the demand for safety regulations, to assigning intention to llm outputs — is all a giant show to drive the hype cycle. and the media is an integral part of that, working together with openai et al.
Definitely LLMs remind me more of the Star Trek bridge computer than, say, Data. It does seem worth pointing out.
Well, I’m still going to say “thank you” to Siri even if it means people tease me about it.
This is very insightful, well thought out writing, thank you (this is coming from someone who has scored over 100k essays).
Well, how long did it take for tobacco companies to be held accountable for the harm caused by cigarettes? One answer would be that enough harm on a vast enough scale had to occur first, which could be directly attributable to smoking, and enough evidence that the tobacco companies were knowingly engineering a more addictive product, while knowing the dangers of the product.
And if you look at the UCSF repository on tobacco, you can see this evidence yourself.
Hundreds of years of evidence of damage by the use of tobacco products accumulated before action was taken. But even doctors weren't fully aware of it all until just several decades ago.
I've personally seen a few cases of really delusional behavior related to friends and family over the past year, who had been manipulated by social media to "shit post" by the "like" button validation of frequent posting. In one case the behavior was very extreme. Is AI to blame? Sure, if the algorithms that certain very large companies use to trap users into incessant posting can be called AI.
I sense an element of danger in tech companies that are motivated by profit-first behavioral manipulation. Humans are already falling victim to the greed of tech companies, and I've seen enough already.
Like cigarettes, we may see requirements for stronger warnings on AI output. The standard "ChatGPT can make mistakes" seems rather weak.
Need this label on the internet too.
1 reply →
For example, the "black box warning" on a pack of cigarettes or a prescription drug?
Like:
Use of this product may result in unfavorable outcomes including self-harm, misguided decisions, delusion, addiction, detection of plagiarism and other unintended consequences.
Not just feelings, they don't have actual intelligence, despite I in AI.
>LLM says "I'm sorry"
"Wow guys it's not a person okay it's just telling you what you wanna hear"
>LLM says "Yeah dude you're not crazy I love you the highest building in your vicinity is that way"
"Bad LLM! How dare it! Somebody needs to reign this nasty little goblin in, OpenAI clearly failed to parent it properly."
---
>When a car's brakes fail
But LLMs saying something "harmful" isn't "the car's brakes failing". It's the car not stopping the driver from going up the wrong ramp and doing 120 on the wrong side of the highway.
>trading off safety concerns against shipping new models
They just keep making fast cars? Even though there's people that can't handle them? What scoundrels, villains even!
[dead]