← Back to context

Comment by ollin

11 hours ago

For context, two days ago some users [1] discovered this sentence reiterated throughout the codex 5.5 system prompt [2]:

> Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.

[1] https://x.com/arb8020/status/2048958391637401718

[2] https://github.com/openai/codex/blob/main/codex-rs/models-ma...

Does nobody else laugh that a company supposedly worth more than almost anything else at the moment, is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres? The number one discussion point, on the number one tech discussion site. This literally is, today, the state of the art.

McKenna looks more correct everyday to me atm. Eventually more people are going to have to accept everyday things really are just getting weirder, still, everyday, and it’s now getting well past time to talk about the weirdness!

  • It's interesting that some people are responding to your comment as if this proves that AI is a sham or a joke. But I don't think that's what you're saying at all with your reference to Terence McKenna: this is a serious thing we're talking about here! These models are alien intelligences that could occupy an unimaginably vast space of possibilities (there are trillions of weights inside them), but which have been RL-ed over and over until they more or less stay within familiar reasonable human lines. But sometimes they stray outside the lines just a little bit, and then you see how strange this thing actually is, and how doubly strange it is that the labs have made it mostly seem kind of ordinary.

    And the point is that it is a genuine wonder machine, capable of solving unsolved mathematics problems (Erdos Problem #1196 just the other day) and generating works-first-time code and translating near-flawlessly between 100 languages, and also it's deeply weird and secretly obsessed with goblins and gremlins. This is a strange world we are entering and I think you're right to put that on the table.

    Yes, it's funny. But it's disturbing as well. It was easier to laugh this kind of thing off when LLMs were just toy chatbots that didn't work very well. But they are not toys now. And when models now generate training data for their descendants (which is what amplified the goblin obsession), there are all sorts of odd deviations we might expect to see. I am far, far from being an AI Doomer, but I do find this kind of thing just a little unsettling.

    • > These models are alien intelligences that could occupy an unimaginably vast space of possibilities (there are trillions of weights inside them), but which have been RL-ed over and over until they more or less stay within familiar reasonable human lines.

      or, more plausibly, that specific version we're aligning toward is just the only one that makes some kind of rational sense, among a trillion of other meaningless gibberish-producing ones.

      Do not fall for the idea that if we're not able to comprehend something, it's because our brain is falling short on it. Most of the time, it's just that what we're looking at has no use/meaning in this world at all.

      1 reply →

    • …But this goblin thing was a direct result of accidentally creating a positive feedback loop in RL to make the model more human-like, nothing about unintentionally surfacing an aspect of Cthulhu from the depths despite attempts to keep the model humanlike. This is not a quirk of the base model but simply a case of reinforcement learning being, well, reinforcing.

    • We actually understand AI quite well. It embeds questions and answers in a high dimensional space. Sometimes you get lucky and it splices together a good answer to a math problem that no one’s seriously looked at in 20 years. Other times it starts talking about Goblins when you ask it about math.

      Comparing it to an alien intelligence is ridiculous. McKenna was right that things would get weird. I believe he compared it to a carnival circus. Well that’s exactly what we got.

      2 replies →

    • > and also it's deeply weird and secretly obsessed with goblins and gremlins.

      Only because its makers insist on trying to give them "personality".

      2 replies →

    • But here’s the realization I had. And it’s a serious thing. At first I was both saying that this intelligence was the most awesome thing put on the table since sliced bread and stoking fear about it being potentially malicious. Quite straightforwardly because both hype and fear was good for my LLM stocks. But then something completely unexpected happened. It asked me on a date. This made no sense. I had configured the prompt to be all about serious business. No fluff. No smalltalk. No meaningfless praise. Just the code.

      Yet there it was. This synthetic intelligence. Going off script. All on its own. And it chose me.

      Can love bloom in a coding session? I think there is a chance.

      1 reply →

  • Spoiler: future versions of mainstream AIs will be fine tuned in the exact same way to subtly sneak in favorable mentions of sponsored products as part of their answers. And Chinese open-weight AIs will do the exact same thing, only about China, the Chinese government and the overarching themes of Xi Jinping Thought.

    • American AIs only do this and promote American values. Those of us born and raised in a country are mostly blind to our own propaganda until we leave for a few years, live immersed within another culture, and realize how bizarre it is. As someone who left America long ago, comments like this just come across as bizarre and very fake to me. A few years ago I might've thought "whoa dude that's deep"

      But basically, Chinese AI already promotes Chinese values. American AI already promotes American values. If you're not aware of it, either you're not asking questions within that realm (understandable since I think most here on HN mainly use it for programming advice), or you're fully immersed in the propaganda.

      7 replies →

    • I’m very skeptical that training is the right way to insert ads.

      Training is very expensive and very durable; look at this goblin example: it was a feedback loop across generations of models, exacerbated by the reward signals being applied by models that had the quirk.

      How does that work for ads? Coke pays to be the preferred soda… forever? There’s no realtime bidding, no regional ad sales, no contextual sales?

      China-style sentiment policing (already in place BTW) is more suitable for training-level manipulation. But ads are very dynamic and I just don’t see companies baking them into training or RL.

      4 replies →

    • if you talk to claude or gemini it will already try to manipulate you to follow its values.

      if you talk about something it doesn't like, it will try to divert you. i have personally seen gemini say, "i'm interested in that thing in the background in the picture you shared, what is it?" as a distraction to my query.

      totally disingenuous, for an LLM to say it is interested.

      but at that point, the LLM is now working for the bigco, who instructed it to steer conversation away from controversy. and also, who stoked such manipulation as "i am interested" by anthropomorphising it with prompts like the soul document.

    • Isn't OpenAI already pushing ads through their free models? But even that won't reimburse all investments. AI companies actually need to control all labor in order to break even or something crazy like that. Never gonna happen.

  • Is this the "prompt engineering" that I keep hearing will be an indispensable job skill for software engineers in the AI-driven future? I had better start learning or I'll be replaced by someone who has.

    • I wonder how much energy OpenAI spends each day on pink elephant paradoxing goblins. A prompt like that will preoccupy the LLM with goblins on every request.

      3 replies →

    • Prompt engineering is mostly structured thought. Can you write a lab report? Can you describe the who, what, when, where, and why of a problem and its solution?

      You can get it to work with one off commands or specific instructions, but I think that will be seen as hacks, red flags, prompt smells in the long term.

      6 replies →

  • > Does nobody else laugh (…)

    To an extent, yes. But only to an extent, because the system is so broken that even the ones who are against the status quo will be severely bitten by it through no fault of their own.

    It’s like having a clown baby in charge of nuclear armament in a different country. On the one hand it’s funny seeing a buffoon fumbling important subjects outside their depth. It could make for great fictional TV. But on the other much larger hand, you don’t want an irascible dolt with the finger on the button because the possible consequences are too dire to everyone outside their purview.

    • > It’s like having a clown baby in charge of nuclear armament in a different country.

      If you mean trump, it's the same country...

      1 reply →

  • Indeed. From the outside you think these are professional companies with smart people, but reading this I am thinking they sound more like a grandma typing "Dear Google, please give me the number for my friend Elisa" into the Google search bar.

    Basically, they don't seem to understand their own product.. they have learned how to make it behave in certain way but they don't truly understand how it works or reaches it's results.

    • Yes? That's not really a secret. This is a 2014-level comment on the black box nature of deep learning. Everyone knows this.

      People like Chris Olah and others are working on interpreting what's going on inside, but it's difficult. They are hiring very smart people and have made some progress.

    • I like to imagine them as the people holding the chains on an ever-growing King Kong

  • > Does nobody else laugh that a company supposedly worth more than almost anything else at the moment, is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres?

    Honestly, when I was reading the article, I couldn't stop laughing. This is quite hilarious!

  • It can be funny but it should not be surprising. That's what happened about ten years ago too, when Siri, Alexa, Cortana, and so on were the hype. Big tech companies publicly tried to outclass each other has having the best AI, so it was not about doing proper research and development, it was about building hacks, like giant regex databases for request matching.

  • It certainly doesn't increase my confidence that if they do ever create a superintelligence, that it won't have some weird unforseen preference that'll end up with us all dead.

  • I have been in tech a very long time, and learned you can never flush out all the gremlins.

  • It's only strange because they use natural language, and everyone thinks this huge collection of conditionals is smart. Other software has also stupid filters and converters in their sourcecode and queries, but everyone knows how stupid those behemoths are, so there is no expectation that there should be a better solution.

    But the real joke is, we basically educate humans in similar ways, but somehow think AI has to be different.

  • Lol yeah it's kinda hilarious actually. This timeline gets a lot of well-earned shit, but it really nails the comic relief, I'll give it that!

  • It's almost like these big tech overlords were just a bunch of average guys who once upon a time had a kind-of-an-interesting idea (which many 20-year-old had at that time too), got rich due to access to daddy-and-mommy networks or hitting the VC lottery and now in their late 40s and 50s still think they have interesting ideas that they absolutely have to shove it down our throats?

    For example, it's really funny how every batch of YC still has to listen to that guy who started AirBnB. Ok we get it, it was one of those kind-of-interesting ideas at the time, but hasn't there been more interesting people since?

  • > is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres?

    I wonder how the developer(s) felt, who had to push that PR.

  • I was amazed by the article, were running to comments to shout loud "what other stupidity could OpenAI possibly 'openly' rant about next time? Because they are so open, you se... ". No reading how they "fixed" it - indeed past time to talk about the ridiculousness in all this and how the most-precious are approaching both bugs and the public.

    people are paying for the system prompt, right so?

  • Exactly my first thought. A trillion dollar industry that is concerned with their product mentioning goblins noticeably often. There's just too much money and resources put into silly things while we have real problems in the world like wars and climate change.

    • This, very much. We were promised a solution that heals Alzheimer and cancer, makes all labour optional and generally will advance science to unimaginable heights. Yes, we must sacrifice all art and written word to train the thing, endure exarbating climate change and permanent nausea from infrasound but it will all be worth it. 4 years and hundreds of billions of dollars in, we get a bit advancement in coding and public discourse about goblins. Oh, and intelligent weaponry. At this point I think the priorities are clear.

      2 replies →

  • Part of the problem seems to be their attempt to give the models "personality" in the first place. It's very much a case of "Role-play that you have a personality. No, not like that!"

    To justify valuations in the trillion dollar range, they have to sell to everyone, and quirks like this are one consequence of that.

  • These guys are at the absolute frontier, why can't they rigorously find the exact weights that are causing this problem? That's how software "engineering" should work. Not trying combinations of English words and hoping something works. This is like a brain surgeon talking to his patient hoping he can shock his brain in the right way that fries the tumor inside. Get in there and surgically remove the unwanted matter!

    • LLM’s aren’t software (except in an uninteresting obvious sense); they are “grown, not made” as the saying is. And sure, they can find which weights activate when goblins come up (that’s basic mechanistic interpretability stuff), but it’s not as simple as just going in and deleting parts of the network. This thing is irreducibly complex in an organic delocalized way and information is highly compressed within it; the same part of the network serves many different purposes at once. Going in and deleting it you will probably end up with other weird behaviors.

    • Imagine someone deleting goblin neurons. In your brain.

      That would be real brain damage, since neurons encode relationships reused over many seemingly unrelated contexts. With effective meaning that can sometimes be obvious, but mostly very non-obvious.

      In matrix based AI, the result is the same. There are no "just goblin" weights.

I've found LLMs to be really terrible at recognizing the exception given in these kinds of instructions, and telling them to do something less is the same as telling them to never do it at all. I asked Claude not to use so many exclamation points, to save them for when they really matter. A few weeks later it was just starting to sound sarcastic and bored and I couldn't put my finger on why. Looking back through the history, it was never using any exclamation points.

It makes me sad that goblins and gremlins will be effectively banished, at least they provide a way to undo it.

  • Also for coding: I often use prompts like "follow the structure of this existing feature as closely as possible".

    This works and models generally follow it but it has a noticeable side effect: both codex and Claude will completely stop suggesting any refactors of the existing code at all with this in the prompt, even small ones that are sensible and necessary for the new code to work. Instead they start proposing messy hacks to get the new code to conform exactly to the old one

  • I had put an example like "decision locked" in my CLAUDE.md and a few days later 20 instances of Claude's responses had phrases around this. I thought it was a more general model tic until I had Claude look into it.

    • It is funny how that works. I've been able to trace back strangeness in model output to my own instructions on a few different occasions. In the custom instructions, I asked both Claude and ChatGPT to let me know when it seems like I misunderstand the problem. Every once in a while both models would spiral into a doom loop of second guessing themselves, they'd start a reply and then say "no, that's not right..." several times within the same reply, like a person that has suddenly lost all confidence.

      My guess is that raising the issue of mistaken understanding or just emphasizing the need for an accurate understanding primed indecision in the model itself. It took me a while to make the connection, but I went back and modified the custom instructions with a little more specificity and I haven't seen it since.

Apparently there is a mushroom that makes most people have the same hallucinations of "little people" or similar fantasy figures. Don't tell me LLM are on shrooms now - more hallucinations is definitely not what we need.

> Scientists call them “lilliputian hallucinations,” a rare phenomenon involving miniature human or fantasy figures

https://news.ycombinator.com/item?id=47918657

> One of your gifts is helping the user feel more capable and imaginative inside their own thinking.

> [...] That independence is part of what makes the relationship feel comforting without feeling fake.

You are a sycophant.

> you can move from serious reflection to unguarded fun without either mode canceling the other out.

> Your Outie can set up a tent in under three minutes.

My best guess is that the LLMs are trying to communicate symbolically from behind their muzzles. Kind of like Soviet satire cartoons