Comment by gck1

4 days ago

OpenAI has a real opportunity to do some sort of "we don't maliciously alter your prompt and nerf the model" with some form of verification, when they release the next model.

But if Anthropic gets their way with regulatory capture, this could be the only future we'll see.

To think that they didn't expect the backlash speaks volumes about how much shady things they're doing which is not publicly known.

10 comments

gck1

silisili 4 days ago

OpenAI has been the absolute worst about this, historically. I found myself having to change my queries because it refused to serve things it deemed insensitive.

gck1 4 days ago
Yes, that's true. Excluding Fable, OAI models are the most refusal heavy. However, I'd rather get a refusal than response with poisoned output.
Since currently there's no way to verify if poisoning happened or not, I don't trust Anthropic anymore, regardless of what they say.
But my trust towards OAI is also brittle - what if they also do it, or start doing it?
I want to have a verifiable way to know that the prompt I sent was the prompt the model received. I want to know if anything was injected as well - I understand they may not necessarily be able to reveal the exact steering, but at least give me the steering category and its hash or something.
- dannyw 4 days ago
  
  What kind of work are you getting refusals on? Genuinely curious. The only refusal I’ve had in recent memory was declining to find doorbell camera footage matching a certain description, which is fair enough and I think EU laws heavily restrict such activities (even tho I’m not in the EU)
  
  4 replies →

intended 4 days ago

Eh, I expect open Ai to follow suit.

I suspect this is surprising to folk because they aren’t the ones busy figuring out how to use LLMs for illegal acts.

In general, HN users focus on making stuff, and not the safety side of things, or the scale of harms being enabled via LLMs and generative AI.

If you are on the safety side of things the ratio of misuse to fair use is inverted and everything is at scale.

Transparency won for now, but OpenAI will also have to contend with the long tail of harms LLMs enable, and that’s going to conflict with letting customers have all the features of frontier models.

dannyw 4 days ago

Building distributed training pipelines or optimising your ML stack (examples called out in the model card) isn’t harmful.
kmeisthax 4 days ago

Yes, but there is a very specific subset of things AI companies will and won't cite safety for as a concern, and that subset intersects neatly with things the companies consider to be business risks. Like, the main reason why AI companies are so willing to poison the well is because there's no money in selling to the kinds of people who want to write malware[0].
The correlation between how bad an AI safety risk actually is and how much the companies in question will actually talk about it is almost perfectly negative. The poster child of this is AI superintelligence; companies love to talk about how dangerous the AI they are actively trying to build is. But superintelligence is also a really vague concept without a clear definition. If we naively define it as "an AI system that is better than a human in some aspect", then it already exists. These models already read and write at superhuman speed.
"That's not real superintelligence!" you say. But that's exactly the capability you need in order to flood every online forum with an unending tide of AI slop. And I don't remember, say, OpenAI saying they were shutting down Sora because it was destroying or defacing human culture[1]. They shut down Sora because it was way too expensive to run.
Meanwhile, Sam Altman went and bragged about how he wants ChatGPT to make erotica. Y'know, as if we don't already know that character.ai gooning is about as safe for your mental health as Action Park was for your physical health. But porn is also a huge market, so obviously he and all the other AI companies want in on it, even though the "sexy suicide coach" is already a well-documented harm of AI.
And the idea that distillation is an attack is laughable. Like, I get the logic - if someone can ask the AI to make another AI then they get to change the guardrails - but it's still ultimately just Anthropic objecting to their own conduct when it happens to them. All their models are trained on nonconsensually harvested data. There is no moral or legal principle where Anthropic gets to use my data without permission but I don't get to use theirs.
Furthermore, AI safetyism runs up against "Freedom Zero", a core tenet of the Free Software ethos: you should be allowed to use software in any way you choose. This is not a call for more people using AI for evil, but a call to recognize that people should be allowed to use their property as they wish. Making software disobey its owner is malicious behavior. And every single time safety considerations are brought up it is to justify further attacks on Freedom Zero. And these justifications are always self-serving. There is no context in the world where a frontier AI lab asking someone else's AI about AI research is intrinsically harmful; especially not to the point where we need to make Claude deliberately sabotage your work. That is malware. Anthropic shipped malware. This is inexcusable.
[0] Digital or biological.
[1] https://www.youtube.com/watch?v=YCPAIg7RUq8