Comment by throwanem
14 days ago
In form, the conversation he had (which appears to have ended five days ago along with all other public footprint) appears to me very much like a heavily refined and customizable version of "Qanon," [1] complete with intermittent reinforcement. That conspiracy theory was structurally novel in its "growth hacking" style of rollout, where ARG and influencer techniques were leveraged to build interest and develop a narrative in conjunction with the audience. That stuff was incredibly compelling when the Lost producers did it in 2010, and it worked just as well a decade later.
Of course, in 2020, it required people behind the scenes doing the work to produce the "drops." Now any LLM can be convinced with a bit of effort to participate in a "role-playing game" of this type with its user, and since Qanon itself was heavily covered and its subject matter broadly archived, even the actual structure is available as a reference.
I think it would probably be pretty easy to get an arbitrary model to start spitting out stuff like this, especially if you conditioned the initial context carefully to work around whatever after-the-fact safety measures may be in place, or just use one of the models that's been modified or finetuned to "decensor" it. There are collections of "jailbreak" prompts that go around, and I would expect Mr. Jawline Fillers here to be in social circles where that stuff would be pretty easy to come by.
For it to become self-reinforcing doesn't seem too difficult to mentally model from there, and I don't think pre-existing organic disorder is really required. How would anyone handle a machine that specializes in telling them exactly what they want to hear, and never ever gets tired of doing so?
Elsewhere in this thread, I proposed a somewhat sanguine mental model for LLMs. Here's another, much less gory, and with which I think people probably are a lot more intuitively familiar: https://harrypotter.fandom.com/wiki/Mirror_of_Erised
I love the analogy of the Mirror of Erised. Obviously not quite the same thing, but similar tendencies, and with similar dangers. Very fitting!
You know, it's odd? I missed the whole initial fad, and only around 2018 or so got around to reading the books to see what all the hype had been about. (Even for a millennial I'm old, and I grew up in a pretty backward corner of the country; I've never played a Pokemon game, either...)
So why's it me, and not an actual fan, who should be the one to come up with Rowling's serial-numbers-filed-off Echo and Narcissus as the example?