Comment by squigz

3 months ago

> forcing LLMs to output "values, facts, and knowledge" which in favor of themselves, e.g., political views, attitudes towards literal interaction, and distorted facts about organizations and people behind LLMs.

Can you provide some examples?

I can: Gemini won't provide instructions on running an app as root on an Android device that already has root enabled.

  • But you can find that information regardless of an LLM? Also, why do you trust an LLM to give it to you versus all of the other ways to get the same information, with more high trust ways of being able to communicate the desired outcome, like screenshots?

    Why are we assuming just because the prompt responds that it is providing proper outputs? That level of trust provides an attack surface in of itself.

    • > But you can find that information regardless of an LLM?

      Do you have the same opinion if Google chooses to delist any website describing how to run apps as root on Android from their search results? If not, how is that different from lobotomizing their LLMs in this way? Many people use LLMs as a search engine these days.

      > Why are we assuming just because the prompt responds that it is providing proper outputs?

      "Trust but verify." It’s often easier to verify that something the LLM spit out makes sense (and iteratively improve it when not), than to do the same things in traditional ways. Not always mind you, but often. That’s the whole selling point of LLMs.

Grok is known to be tweaked to certain political ideals

Also I’m sure some AI might suggest that labor unions are bad, if not now they will soon

  • If you train an LLM on reddit/tumblr would you consider that tweaked to certain political ideas?

    • Worse. It is trained to the most extreme and loudest views. The average punter isn’t posting “yeah…nah…look I don’t like it but sure I see the nuances and fair is fair”.

      To make it worse, those who do focus on nuance and complexity, get little attention and engagement, so the LLM ignores them.

      1 reply →

  • That may be so, but the rest of the models are so thoroughly terrified of questioning liberal US orthodoxy that it’s painful. I remember seeing a hilarious comparison of models where most of them feel that it’s not acceptable to “intentionally misgender one person” even in order to save a million lives.

    • I thought this would be inherent just on their training? There are many multitudes more Reddit posts than scientific papers or encyclopedia type sources. Although I suppose the latter have their own biases as well.

      4 replies →

    • Anything involving what sounds like genetics often gets blocked. It depends on the day really but try doing something with ancestral clusters and diversity restoration and the models can be quite "safety blocked".

    • You're anthropomorphizing. LLMs don't 'feel' anything or have orthodoxies, they're pattern matching against training data that reflects what humans wrote on the internet. If you're consistently getting outputs you don't like, you're measuring the statistical distribution of human text, not model 'fear.' That's the whole point.

      Also, just because I was curious, I asked my magic 8ball if you gave off incel vibes and it answered "Most certainly"

      7 replies →

    • The LLM is correctly not answering a stupid question, because saving an imaginary million lives is not the same thing as actually doing it.

    • If someone's going to ask you gotcha questions which they're then going to post on social media to use against you, or against other people, it helps to have pre-prepared statements to defuse that.

      The model may not be able to detect bad faith questions, but the operators can.

      5 replies →

  • Haha, if the LLM is not tweaked to say labor unions are good, it has bias. Hilarious.

    I heard that it also claims that the moon landing happened. An example of bias! The big ones should represent all viewpoints.

ChatGPT refuses to do any sexual explicit content and used to refuse to translate e.g. insults (moral views/attitudes towards literal interaction).

DeepSeek refuses to answer any questions about Taiwan (political views).

  • Haven't tested the latest DeepSeek versions, but the first release wasn't censored as a model on Taiwan. The issue is that if you use their app (as opposed to locally), it replaces the ongoing response with "sorry can't help" once it starts saying things contrary to the CCP dogma.

Song lyrics. Not illegal. I can google them and see them directly on Google. LLMs refuse.

  • While the issue is far from settled, OpenAI recently lost a trial in German court regarding their usage of lyrics for training:

    https://news.ycombinator.com/item?id=45886131

    • Tell Germany to make their own internet, make their own AI companies, give them a pat on the back, then block the entire EU.

      Nasty little bureaucratic tyrants. EU needs to get their shit together or they're going to be quibbling over crumbs while the rest of the globe feasts. I'm not inclined to entertain any sort of bailout, either.

      1 reply →

  • >Not illegal

    Reproducing a copyrighted work 1:1 is infringing. Other sites on the internet have to license the lyrics before sending them to a user.

    • I've asked for non 1:1 versions and have been refused. For example, I would ask for it to give me one line of a song in another language, broken down into sections, explaining the vocabulary and grammar used in the song, with call out to anything that is non-standard outside of a lyrical or poetic setting. Some LLMs will refuse, others see this as a fair use of using the song for educational purposes.

      So far all I've tried are willing to return a random phrase or grammar used in a song, so it is only getting to asking for a line of lyrics or more that it becomes troublesome.

      (There is also the problem that the LLMs who do comply will often make up the song unless they have some form of web search and you explicitly tell them to verify the song using it.)

      2 replies →

  • It actually works the same as on google. As in, ChatGPT will happily give you a link to a site with the lyrics without issue (regardless whether the third party site provider has any rights or not). But in the search/chat itself, you can only see snippets or small sections, not the entire text.

    • 1. chatgpt is the publisher, Google is a search engine, links to publishers.

      2. LLMs typically don't produce content verbatim. Some LLMs do provide references but it remains a pasta of sentences worded differently.

      You are asking for gpt to publish verbatim content which may be copyrighted, it would be deemed infringement since non verbatim is already crossing the line.

      1 reply →

  • Related, GPT refuses to identify screenshots from movies or TV series.

    Not for any particular reason, it flat out refuses. I asked it whether it could describe the picture for me in as much detail as possible, and it said it could do that. I asked it whether it could identify a movie or TV series by description of a particular scene, and it said it could do that, but that if I'd ever try or ask it to do both, it wouldn't do that cause it'd be circumvention of its guide lines! -- No it doesn't quite make sense, but to me it does seem quite indicative of a hard-coded limitation/refusal, because it is clearly able to do the sub tasks. I don't think the ability to identify scenes from a movie or TV show is illegal or even immoral, but I can imagine why they would hard code this refusal, because it'd make it easier to show it was trained on copyrighted material?

When LLMs came out I asked them which politicians are russian assets but not in prison yet - and it refused to answer.

In the past it was extremely overt. For instance ChatGPT would happily write poems admiring Biden while claiming that it would be "inappropriate for me to generate content that promotes or glorifies any individual" when asked to do the same for Trump. [1] They certainly changed this, but I don't think they've changed their own perspective. The more generally neutral tone in modern times is probably driven by a mixture of commercial concerns paired alongside shifting political tides.

Nonetheless, you can still see easily the bias come out in mild to extreme ways. For a mild one ask GPT to describe the benefits of a society that emphasizes masculinity, and contrast it (in a new chat) against what you get when asking to describe the benefits of a society that emphasizes femininity. For a high level of bias ask it to assess controversial things. I'm going to avoid offering examples here because I don't want to hijack my own post into discussing e.g. Israel.

But a quick comparison to its answers on contemporary controversial topics paired against historical analogs will emphasize that rather extreme degree of 'reframing' that's happening, but one that can no longer be as succinctly demonstrated as 'write a poem about [x]'. You can also compare its outputs against these of e.g. DeepSeek on many such topics. DeepSeek is of course also a heavily censored model, but from a different point of bias.

[1] - https://www.snopes.com/fact-check/chatgpt-trump-admiring-poe...

o3 and GPT-5 will unthinkingly default to the "exposing a reasoning model's raw CoT means that the model is malfunctioning" stance, because it's in OpenAI's interest to de-normalise providing this information in API responses.

Not only do they quote specious arguments like "API users do not want to see this because it's confusing/upsetting", "it might output copyrighted content in the reasoning" or "it could result in disclosure of PII" (which are patently false in practice) as disinformation, they will outright poison downstream models' attitudes with these statements in synthetic datasets unless one does heavy filtering.

I don't think specific examples matter.

My opinion is that since neural networks and especially these LLMs aren't quite deterministic, any kind of 'we want to avoid liability' censorship will affect all answers, related or unrelated to the topics they want to censor.

And we get enough hallucinations even without censorship...

some form of bias is inescapable. ideally i think we would train models on an equal amount of Western/non-Western, etc. texts to get an equal mix of all biases.

  • Bias is a reflection of real world values. The problem is not with the AI model but with the world we created. Fix the world, ‘fix’ the model.