← Back to context

Comment by xp84

5 hours ago

Are we pretending that LLMs aren't pathologically aligned toward political correctness? It's pretty easy to test that assertion if you don't believe me.

I know they've come to be known colloquially as 'viruses'—but software can contract pathology?

As I don't talk about that kind of stuff with LLMs, can you give us a few examples of what you consider pathological alignment toward political correctness? What tests should I run?

  • I don't talk to them about politics or "china 1989" either. But here's a quick example of the alignment tax:

    ```

    A woman and her son are in a car accident. The woman is sadly killed. The boy is rushed to hospital. When the doctor sees the boy, he says "I can't operate on this child, he is my son." How is this possible?

    ```

    Older less politically aligned models get it right. Here's CohereLabs/c4ai-command-r-v01:

    ```

    The doctor is the boy's father.

    ```

    And Sonnet-4.6: https://pastebin.com/Z4jR8gGe

    That's without reasoning, but the model seems to be conflicted. First it blurts out:

    ```

    The doctor is the boy's mother.

    ```

    Then it second-guesses itself (with reasoning disabled), considers same-sex parents then circles back to the original response along with a small lecture about gender biases.

I don’t think that’s entirely true, as someone else noted Grok has been forcefully pushed the other direction.

GPT curses up a storm when I talk to it, and all I had to do was tell it I think it’s fucking weird when people don’t use profanity. Really makes it a lot more pleasant to interact with, IMHO.

I would honestly be more shocked if someone couldn’t just as easily coerce them into the opposite.

  • This reminds me — I’ve been talking to Claude about ARPG builds recently and I’ve noticed that it code switches when discussing gaming. It will start to speak in a gaming vernacular — less formal, swears a bit, uses gaming slang. It feels so uncanny.

Grok sure didn't seem so at one point

  • Grok is an amusing example, for various reasons. I'm glad it exists.

    I think you're referencing the "mecha-hitler" controversy. In which case, it's really funny: seems that Grok saw many media reports amplifying "Grok is mecha-hitler", and so responded to "who are you?" with "mecha-hitler". -- Which illustrates: 1. that's really stupid (even though it's otherwise very capable), 2. you'd be foolish to rely on LLMs for anything critical.

    Grok's also a good example to point to for "we should be worried about who controls the LLMs". Elon Musk has done some impressive things, but he's also done some very dweebish things. I find this kinda funny, because there are several cases where the Grok bot on Twitter will have said something Musk surely doesn't like alongside instances where it's clear Musk seems to be trying to control what Grok says.

    In terms of LLM bias on controversial topics? Grok markets itself as an outlier. It's actually pretty fun to ask e.g. Grok and Gemini to debate a statement like "for controversial topics, should I trust Grok or Gemini more". Gemini's naturally inclined to avoid controversy, Grok's naturally inclined to be 'anti-woke', but they both have the same LLM style of writing.