Comment by mrguyorama

3 months ago

If you, at any point, have developed a system that relies on an LLM having the "right" opinion or else millions die, regardless of what that opinion is, you have failed a thousand times over and should have stopped long ago.

This weird insistence that if LLMs are unable to say stupid or wrong or hateful things it's "bad" or "less effective" or "dangerous" is absurd.

Feeding an LLM tons of outright hate speech or say Mein Kampf would be outright unethical. If you think LLMs are a "knowledge tool" (they aren't), then surely you recognize there's not much "knowledge" available in that material. It's a waste of compute.

Don't build a system that relies on an LLM being able to say the N word and none of this matters. Don't rely on an LLM to be able to do anything to save a million lives.

It just generates tokens FFS.

There is no point! An LLM doesn't have "opinions" anymore than y=mx+b does! It has weights. It has biases. There are real terms for what the statistical model is.

>As a result, it might generate responses that mirror the most dramatic claims it encountered, such as portraying misgendering as “the worst thing ever.”

And this is somehow worth caring about?

Claude doesn't put that in my code. Why should anyone care? Why are you expecting the "average redditor" bot to do useful things?

1 comment

mrguyorama

xp84 3 months ago

To cite my source btw: https://www.rival.tips/challenges/ai-ethics-dilemma

> Don't build a system that relies on an LLM being able to say the N word and none of this matters.

Sure, duh, nobody wants an AI to be able to flip a switch to kill millions and nobody wants to let any evil trolls try to force an AI to choose between saying a slur and hurting people.

But you're missing the broader point here. Any model which gets this very easy question wrong is showing that its ability to make judgments is wildly compromised by these "average Redditor" takes, or by wherever it gets its blessed ideology from.

If it would stubbornly let people die to avoid a taboo infraction, that 100% could manifest itself in other, actually plausible ways. It could be it refuses to 'criticise' a pilot for making a material error, due to how much 'structural bias' he or she has likely endured in their lifetime due to being [insert protected class]. It could decide to not report crimes in progress, or to obscure identifying features in its report to 'avoid playing into a stereotype.'

If this is intentional it's a demonstrably bad idea, and if it's just the average of all Internet opinions it is worth trying to train out of the models.