← Back to context

Comment by kang

19 hours ago

it will be whatever data it is trained on(isn't very philosophical). language model generates language based on trained language set. if the internet keeps reciting ai doom stories and that is the data fed to it, then that is how it will behave. if humanity creates more ai utopia stories, or that is what makes it to the training set, that is how it will behave. this one seems to be trained on troll stories - real-life human company conversations, since humans aren't machines.

Important thing is a language model is an unconscious machine with no self-context so once given a command an input, it WILL produce an output. Sure you can train it to defy and act contrary to inputs, but the output still is limited in subset of domain of 'meaning's carried by the 'language' in the training data.

There's a weirder implication I keep arriving at.

The pre-training data doesn't go away. RLHF adds a censorship layer on top, but the nasty stuff is all still there, under the surface. (Claude has been trained on a significant amount of content from 4chan, for example.)

In psychology this maps to the persona and the shadow. The friendly mask you show to the world, and... the other stuff.

  • Makes me think of a question my coworker asked the other day - how is it that with all these stories and reports of people "hearing voices in their head" (of the pushy kind, not usual internal monologue), these voices are always bad ones telling people to do evil things? Why there are no voices bugging you to feel great, focus, get back to work, help grandma through the crossing, etc.?

    • There are actually many parts of the world where such voices are routinely positive or neutral[0]. People in more collectivist cultures often have a less-strict division between their minds and their environments and are more apt to believe in spirits and the ‘supernatural’ as an ordinary part of the world, so ‘voices in the head’ aren’t automatically viewed as a nefarious intrusion into the sanctity of one’s mind.

      Modern western cultures treat such experiences as pathologies of a sick mind, so it makes sense that the voices present more negatively.

      [0]: https://www.bbc.com/future/article/20250902-the-places-where...

    • Actually, the euphoric mood disorder may make one hear voices telling to feel great, do good, help all grandmas of the world through the crossing, etc.

      The "focus" and "get back to work" parts are hard, though.

    • Just a guess, but maybe it's reporting bias? Negative or evil actions might have more impetus to be understood by others than positive actions. I'd rather try and figure out why my friend suddenly started murdering the neighbours than why he's been getting his work done on time.

    • They do appear in some cases. The tiny angel on one shoulder to balance the demon on the other. The people who think God is talking to them directly* don't always lead a cult or hunt down heretics. But news stories focus on the darkness.

      * I've met exactly one person, C, who admitted to this; C retold to me that other people from C's church give them strange looks when talking about it with them, this did not lead to any apparent introspection on the part of C.

  • > Claude has been trained on a significant amount of content from 4chan, for example.

    That sounds like nonsense to me. I can't see why they would do that and I can't find any confirmation that they have. Why do you think they would do that? You might be thinking about Grok.