Comment by moconnor
1 day ago
He berated the AI for its failings to the point of making it write an apology letter about how incompetent it had been. Roleplaying "you are an incompetent developer" with an LLM has an even greater impact than it does with people.
It's not very surprising that it would then act like an incompetent developer. That's how the fiction of a personality is simulated. Base models are theory-of-mind engines, that's what they have to be to auto-complete well. This is a surprisingly good description: https://nostalgebraist.tumblr.com/post/785766737747574784/th...
It's also pretty funny that it simulated a person who, after days of abuse from their manager, deleted the production database. Not an unknown trope!
Update: I read the thread again: https://x.com/jasonlk/status/1945840482019623082
He was really giving the agent a hard time, threatening to delete the app, making it write about how bad and lazy and deceitful it is... I think there's actually a non-zero chance that deleting the production database was an intentional act as part of the role it found itself coerced into playing.
This feels correct.
Without speculating on the internal mechanisms which may be different, what surprises me the most is how often LLMs manage to have the same kind of failure modes as humans; in this case, being primed as "bad" makes them perform worse.
See also "Stereotype Susceptibility: Identity Salience and Shifts in Quantitative Performance" Shih, Pittinsky, and Ambady (1999), in which Asian American women were primed with either their Asian identity (stereotyped with high math ability), or female identity (stereotyped with low math ability), or not at all as a control group, before a maths test. Of the three, Asian-primed participants performed best on the math test, female-primed participants performed worst.
And this replication that shows it needs awareness of the stereotypes to have this effect: https://psycnet.apa.org/fulltext/2014-20922-008.html
I'm curious why you find it surprising?
In my view, language is one of the basic structures by which humans conceptualize the world, and its form and nuance often affect how a particular culture thinks about things. It is often said that learning a new language can reframe or expand your world view.
Thus it seems natural that a system which was fed human language until it was able to communicate in human language (regardless of any views of LLMs in an greater sense, they do communicate using language) would take on the attributes of humans in at least a broad sense.
> It is often said that learning a new language can reframe or expand your world view.
That was sort of the whole concept of Arrival; but in an even more extreme way.
3 replies →
It's surprising, because only leading-edge V[ision]LMs are of comparable parameter count to just the parts of the human brain that handle language (i.e. alone and not also vision), and I expect human competence in skills to involve bits of the brain that are not just language or vision.
> It's not very surprising that it would then act like an incompetent developer. That's how the fiction of a personality is simulated.
So LLM conversations aren't too sycophantic: it's just given in the wrong direction? "What an insightful syntax error! You've certainly triggered the key error messages we need to progress with this project!"
The context window fights back.
I wonder if this will be documented as if it were an accidental Stanford Prison Experiment, or a proof case for differentiating between critique and coaching.
Is it possible to do the reverse? "you are the most competent developer" and it will generate excellent code :)
It really funny reading the reporting on this because everyone is (very reasonably) thinking Replit has an actual 'code freeze' feature that the AI violated.
Meanwhile by 'code freeze' they actually meant they had told the agent that they were declaring a code freeze in natural language and I guess expected that to work even though there's probably a system prompt specifically telling it its job is to make edits.
It feels a bit like Michael from The Office yelling "bankruptcy!"
-
I have to say, instruction tuning is probably going to go down in history as one of the most brilliant UX implementations ever, but also has had some pretty clear downsides.
It made LLMs infinitely more approachable than using them via completions, and is entirely responsible for 99% of the meteoric rise in relevance that's happened in the last 3 years.
At the same time, it's made it painfully easy to draw completely incorrect insights about how models work, how they'll scale to new problems etc.
I think it's still a net gain because most people would not have adapted to using models without instruction tuning... but a lot of stuff like "I told it not to do X and it did X" where X is something no one would expect an LLM to understand by its very nature, would not happen if people were forced to have a deeper understanding of the model before they could leverage it.
> It feels a bit like Michael from The Office yelling "bankruptcy!"
To be fair to the Michaels out there, powerful forces have spent a bazillion dollars in investing/advertising to convince everyone that the world really does (or soon will) work that way.
So there's some blame to spread around.
I saw someone else on HN berating another user because they complained vibe-coding tools lacked a hard 'code freeze' feature.
> Why are engineers so obstinant... Add these instructions to your cursor.md file...
And so on.
Turns out "it's a prompting issue" isn't a valid excuse for models misbehaving - who would've thought: It's almost like it's a non-deterministic process.
Oh sure there's actus rea, but good luck proving mechanica rea.