Comment by ryao

1 day ago

Am I the only one who thinks mention of “safety tests” for LLMs is a marketing scheme? Cars, planes and elevators have safety tests. LLMs don’t. Nobody is going to die if a LLM gives an output that its creators do not like, yet when they say “safety tests”, they mean that they are checking to what extent the LLM will say things they do not like.

103 comments

ryao

natrius 1 day ago

An LLM can trivially instruct someone to take medications with adverse interactions, steer a mental health crisis toward suicide, or make a compelling case that a particular ethnic group is the cause of your society's biggest problem so they should be eliminated. Words can't kill people, but words can definitely lead to deaths.

That's not even considering tool use!

thayne 1 day ago
Part of the problem is due to the marketing of LLMs as more capable and trustworthy than they really are.
And the safety testing actually makes this worse, because it leads people to trust that LLMs are less likely to give dangerous advice, when they could still do so.
- jdross 1 day ago
  
  Spend 15 minutes talking to a person in their 20's about how they use ChatGPT to work through issues in their personal lives and you'll see how much they already trust the "advice" and other information produced by LLMs.
  Manipulation is a genuine concern!
  
  12 replies →
- brookst 1 day ago
  
  Can you point to a specific bit of marketing that says to take whatever medications a LLM suggests, or other similar overreach?
  People keep talking about this “marketing”, and I have yet to see a single example.
ryao 1 day ago
This is analogous to saying a computer can be used to do bad things if it is loaded with the right software. Coincidentally, people do load computers with the right software to do bad things, yet people are overwhelmingly opposed to measures that would stifle such things.
If you hook up a chat bot to a chat interface, or add tool use, it is probable that it will eventually output something that it should not and that output will cause a problem. Preventing that is an unsolved problem, just as preventing people from abusing computers is an unsolved problem.
- 0points 1 day ago
  
  > This is analogous to saying a computer can be used to do bad things if it is loaded with the right software.
  It's really not. Parent's examples are all out-of-the-box behavior.
- ronsor 1 day ago
  
  As the runtime of any program approaches infinity, the probability of the program behaving in an undesired manner approaches 1.
  
  4 replies →
- pesfandiar 1 day ago
  
  The society has accepted that computers bring more benefit than harm, but LLMs could still get pushback due to bad PR.
bilsbie 1 day ago
PDFs can do this too.
- xigoi 1 day ago
  
  In such a case, the author of the PDF can be held responsible.
  
  5 replies →
- jiggawatts 1 day ago
  
  Twitter does it at scale.
selfhoster11 1 day ago
Yes, and a table saw can take your hand. As can a whole variety of power tools. That does not render them illegal to sell to adults.
- ZiiS 1 day ago
  
  It dose render them illigal to sell without studying their safety.
- conception 18 hours ago
  
  No but they have guards on them.
- vntok 1 day ago
  
  An interesting comparison.
  Table saws sold all over the world are inspected and certified by trusted third parties to ensure they operate safely. They are illegal to sell without the approval seal.
  Moreover, table saws sold in the United States & EU (at least) have at least 3 safety features (riving knife, blade guard, antikickback device) designed to prevent personal injury while operating the machine. They are illegal to sell without these features.
  Then of course there are additional devices like sawstop, but it is not mandatory yet as far as I'm aware. Should be in a few years though.
  LLMs have none of those board labels or safety features, so I'm not sure what your point was exactly?
  
  5 replies →
andsoitis 20 hours ago

> An LLM can trivially make a compelling case that a particular ethnic group is the cause of your society's biggest problem so they should be eliminate
This is an extraordinary claim.
I trust that the vast majority of people are good and would ignore such garbage.
Even assuming that an LLM can trivially build a compelling case to convince someone who is not already murderous to go on a killing spree to kill a large group of people, one killer has limited impact radius.
For contrast, many books and religious texts, have vastly more influence and convincing power over huge groups of people. And they have demonstrably caused widespread death or other harm. And yet we don’t censor or ban them.
andsoitis 20 hours ago
> An LLM can trivially instruct someone to take medications with adverse interactions,
What’s an example of such a medication that does not require a prescription?
- pixl97 19 hours ago
  
  How about just telling people that drinking grapefruit juice with their liver medicine is a good idea and to ignore their doctor.
  
  1 reply →
- mdemare 17 hours ago
  
  Tylenol.
  
  1 reply →
- edoceo 19 hours ago
  
  Oil of wintergreen?
amelius 20 hours ago

Yeah, give it access to some bitcoin and the internet, and it can definitely cause deaths.
pyuser583 1 day ago

The problem is “safety” prevents users from using LLMs to meet their requirements.
We typically don’t critique the requirements of users, at least not in functionality.
The marketing angle is that this measure is needed because LLMs are “so powerful it would be unethical not to!”
AI marketers are continually emphasizing how powerful their software is. “Safety” reinforces this.
“Safety” also brings up many of the debates “mis/disinformation” brings up. Misinformation concerns consistently overestimate the power of social media.
I’d feel much better if “safety” focused on preventing unexpected behavior, rather than evaluating the motives of users.
anonymoushn 1 day ago

The closed weights models from OpenAI already do these things though
123yawaworht456 1 day ago
does your CPU, your OS, your web browser come with ~~built-in censorship~~ safety filters too?
AI 'safety' is one of the most neurotic twitter-era nanny bullshit things in existence, blatantly obviously invented to regulate small competitors out of existence.
- no_wizard 1 day ago
  
  It isn’t. This is dismissive without first thinking through the difference of application.
  AI safety is about proactive safety. Such an example: if an AI model could be used to screen hiring applications, making sure it doesn’t have any weighted racial biases.
  The difference here is that it’s not reactive. Reading a book with a racial bias would be the inverse; where you would be reacting to that information.
  That’s the basis of proper AI safety in a nutshell
  
  6 replies →
- jowea 1 day ago
  
  Social media does. Even person to person communication has laws that apply to it. And the normal self-censorship a normal person will engage in.
  
  1 reply →
- derektank 1 day ago
  
  iOS certainly does by limiting you to the App Store and restricring what apps are available there
  
  1 reply →
bongodongobob 1 day ago
Books can do this too.
- ben_w 1 day ago
  
  There's a reason the inherititors of the coyright* refused to allow more copies of Mein Kampf to be produced until that copyright expired.
  * the federal state of Bavaria
  
  2 replies →
- derektank 1 day ago
  
  Major book publishers have sensitivity readers that evaluate whether or not a book can be "safely" published nowadays. And even historically there have always been at least a few things publishers would refuse to print.
  
  3 replies →
buyucu 1 day ago
At the end of the day an LM is just a machine that talks. It might say silly things, bad things, nonsensical things, or even crazy insane things. But end the end of the day it just talks. Words don't kill.
LM safety is just a marketing gimmick.
- hnaccount_rng 1 day ago
  
  We absolutely regulate which words you can use in certain areas. Take instructions on medicine for one example
  
  1 reply →

Rastonbury 2 hours ago

I think that's a bit uncharitable, the top companies have hired top talent who's explicit job is safety, which is obviously a hard problem ever since Microsoft's Tay. Anthropic publishes pretty extensive safety reviews of their models. Misalignment on incentives,prioritisation of speed and it being and almost an impossible task may make it seem like a marketing scheme. It's not like companies who slap on all the certifications AWS has just because they run on AWS

olalonde 1 day ago

Especially since "safety" in this context often just means making sure the model doesn't say things that might offend someone or create PR headaches.

SV_BubbleTime 1 day ago
Don’t draw pictures of celebrities.
Don’t discuss making drugs or bombs.
Don’t call yourself MechaHitler… which I don’t care that while scenario was objectively funny on its sheer ridiculousness.
- jekwoooooe 19 hours ago
  
  Sure it’s funny until some mentally unstable Nazi sympathizer goes and shoots up another synagogue. So funny.
  
  2 replies →

recursive 1 day ago

I also think it's marketing but kind of for the opposite reason. Basically I don't think any of the current technology can be made safe.

nomel 1 day ago

Yes, perfection is difficult, but it's relative. It can definitely be made much safer. Looking at the analysis of pre vs post alignment makes this obvious, including when the raw unaligned models are compared to "uncensored" models.

razodactyl 11 hours ago

There's a bias that smart people have in underestimating the stupidity of those on the left side of the intelligence curve.

Not even that, children believe anything and more so that of a computer designed to be "harmless".

layer8 20 hours ago

It’s about safety for the LLM provider, not necessarily the user.

jrflowers 1 day ago

> Am I the only one who thinks mention of “safety tests” for LLMs is a marketing scheme?

It is. It is also part of Sam Altman’s whole thing about being the guy capable of harnessing the theurgical magicks of his chat bot without shattering the earth. He periodically goes on Twitter or a podcast or whatever and reminds everybody that he will yet again single-handedly save mankind. Dude acts like he’s Buffy the Vampire Slayer

simianwords 1 day ago

I hope the same people questioning ai safety (which is reasonable) don’t also hold concern on Grok due to the recent incident.

You have to understand that a lot of people do care about these kind of things.

stogot 17 hours ago

At my company (which produces models) almost all the responsible AI jazz is about DEI and banning naughty words. Little actions on preventing bad outcomes

eviks 1 day ago

Why is your definition of safety so limited? Death isn't the only type of harm...

ryao 1 day ago
There are other forms of safety, but whether a digital parrot says something that people do not like is not a form of safety. They are abusing the term safety for marketing purposes.
- eviks 1 day ago
  
  You're abusing the terms by picking either the overly limited ("death") or overly expansive ("not like") definitions to fit your conclusion. Unless you reject the fact that harm can come from words/images, a parrot can parrot harmful words/images, so be unsafe.
  
  4 replies →

ks2048 1 day ago

You could be right about this being an excuse for some other reason, but lots of software has “safety tests” beyond life or death situations.

Most companies, for better or worse (I say for better) don’t want their new chatbot to be a RoboHitler, for example.

ryao 1 day ago

It is possible to turn any open weight model into that with fine tuning. It is likely possible to do that with closed weight models, even when there is no creator provided sandbox for fine tuning them, through clever prompting and trying over and over again. It is unfortunate, but there really is no avoiding that.
That said, I am happy to accept the term safety used in other places, but here it just seems like a marketing term. From my recollection, OpenAI had made a push to get regulation that would stifle competition by talking about these things as dangerous and needing safety. Then they backtracked somewhat when they found the proposed regulations would restrict themselves rather than just their competitors. However, they are still pushing this safety narrative that was never really appropriate. They have a term for this called alignment and what they are doing are tests to verify alignment in areas that they deem sensitive so that they have a rough idea to what extent the outputs might contain things that they do not like in those areas.

halfjoking 1 day ago

It's overblown. Elon shipped Hitler grok straight to prod

Nobody died

pona-a 1 day ago
Playing devil's advocate, what if it was more subtle?
Prolonged use of conversational programs does reliably induce certain mental states in vulnerable populations. When ChatGPT got a bit too agreeable, that was enough for a man to kill himself in a psychotic episode [1]. I don't think this magnitude of delusion was possible with ELIZA, even if the fundamental effect remains the same.
Could this psychosis be politically weaponized by biasing the model to include certain elements in its responses? We know this rhetoric works: cults have been using love-bombing, apocalypticism, us-vs-them dynamics, assigned special missions, and isolation from external support systems to great success. What we haven't seen is what happens when everyone has a cult recruiter in their pocket, waiting for a critical moment to offer support.
ChatGPT has an estimated 800 million weekly active users [2]. How many of them would be vulnerable to indoctrination? About 3% of the general population has been involved in a cult [3], but that might be a reflection of conversion efficiency, not vulnerability. Even assuming 5% are vulnerable, that's still 40 million people ready to sacrifice their time, possessions, or even their lives in their delusion.
[1] https://www.rollingstone.com/culture/culture-features/chatgp...
[2] https://www.forbes.com/sites/martineparis/2025/04/12/chatgpt...
[3] https://www.peopleleavecults.com/post/statistics-on-cults
- stogot 17 hours ago
  
  You’re worried about indoctrination in an LLM but it starts much earlier than that. The school system is indoctrination of our youngest minds, both today in the West and its Prussian origins
  https://today.ucsd.edu/story/education-systems-were-first-de...
  We should fix both systems. I don’t want Altman’s or Musk’s opinions indoctrinating

ignoramous 1 day ago

> Nobody is going to die

Callous. Software does have real impact on real people.

Ex: https://news.ycombinator.com/item?id=44531120