← Back to context

Comment by toddmorey

3 days ago

I always imagine the model rolling its silicon eyes when it’s assigned a personality (“you are an expert growth hacker”) at the start of the prompt. Was that ever actually shown to be effective? Is it still?

> Was that ever actually shown to be effective? Is it still?

Yes! Personas demonstrated measurable improvement in a few different ways, with caveats of course. The common intuition is that personas influence token space in beneficial ways.

I'll come back here later on desktop and link a few (still) relevant papers on this topic.

I remember there were some studies that this kind of thing was effective a year or so ago, so essentially a lifetime in Model years.

However to me it seems completely reasonable that it would work, because my understanding of what happens is the model interprets what you said as:

Look for a group of people who are considered to be expert growth hackers by the world at large and answer my questions as though they were answering them.

So assuming that there are a set of questions that can best be answered by people that most other people identify as expert growth hackers then yes, I believe assigning a personality in this way should obviously work.

  • I imagined it as kind of a shorthand for "you should be spending my tokens on looking for / addressing issues like X, Y, and Z," where X, Y, and Z are the sorts of things that an expert in [insert domain here] would be likely to care most about.

    • At some point we have to just admit we're mass cargo-culting here and that these secret invocations people swear by have the same epistemic value as medieval superstitions.

      1 reply →

    • right, but the thing is how do they know what an expect in [insert domain here] would care about? Obviously by finding content created by

      people who claim to be experts in [domain] people who others claim to be experts in [domain]

      hopefully valuing membership in group two over membership in group 1.

  • It's been interesting to see how aggressively some reasoning models like to "reason" by analogy. They love to say things like "it's like a CPU" or "it's like a highway", and then they start to make logical leaps based off that rather than just using it for user explanation. Gemini 2.5 and 3.1 Pro have been particularly bad for this type of behavior. Telling models to "speak as though you are a physiologist considering the case with an expert colleague" gets them to "reason" using a more correct linguistic substrate.

    The Opus models over the last year doesn't seem as vulnerable to this type of behavior and I've noticed the "identify as expert" prompt tricks aren't as meaningful there.

  • I propose we move away from the framing of "Model years" - they're standard human research years. Yes, likely more people are working on it, and also working harder, but ever since we acquired a certain amount of compute in the world, many people were able to independently find the same patterns and train models.

I feel it helps for the personality aspect, how it handles answers and general vocabulary, but it doesn’t in any way improve skill level, at least that’s my take from building an assistant.

It reminds me when people would stuff their image prompts with things like NO DEFORMED FINGERS.

I've always wondered if the go-to should have been prefilling its response with "I am an expert growth leader, and here are my thoughts:".

There was a time when stuff like "Unreal Engine, trending on ArtStation, 8K resolution" actually worked when prompting image gen models because such labels actually correlated with higher-quality images in the web-crawled training datasets available back then.

Back with some papers. (Apologies in advance; I typically don't edit/format comments much here, please bear with me.)

Notable papers describing performance improvements with prescribed roles and personas:

- ExpertPrompting: Instructing Large Language Models to be Distinguished Experts (2023) https://arxiv.org/abs/2305.14688 (if you're going to only read one paper here, maybe read this one but know there has been a lot of follow up with more modern models.)

- Expert Personas Improve LLM Alignment but Damage Accuracy (2026) https://arxiv.org/abs/2603.18507

- When Does Persona Prompting Actually Help? (2026) https://arxiv.org/abs/2605.29420

- Unveiling Power on Combining Prompt Engineering Techniques: An Experimental Evaluation on Code Generation (2025) https://doi.org/10.5753/sbbd.2025.247251

- A Pattern Language for Persona-based Interactions with LLMs (2025) https://www.dre.vanderbilt.edu/~schmidt/PDF/Persona-Pattern-...

A TLDR of my *admittedly heavily biased* mental model (so take it with a grain of salt): personas do improve task alignment and precision to measurable effect but with observed negative impact to accuracy and knowledge grounding. Overall, this makes it quite suitable and preferred for code generation scenarios. (Don't over-index on 'accuracy' here as meaning "bad code", it's more about verbosity/jargon reducing clarity of higher order goals like business objectives and system architecture.)

Outside of code generation, personas have the interesting effect of increasing implicit biases and stereotypes. It's not hard to imagine something like "you are a left|right wing politician ..." or "you are a senior-citizen|teenager ..." influencing token space construction considerably.

From what I've heard, personas give a greater chance that the LLM will answer confidently.. and also a greater chance it'll hallucinate something when the data is sparse. Supposedly "grounding" the personas on real documents/web searches is the best approach. Anecdotal though.

The reason it seems suspicious is that it's phrased in a way that's oriented towards humans. I haven't tested this, but I suspect you'd get similar results if you said something like "orient your response to that of a growth hacker." Either one is likely to have the desired effect on the stochastic result.

At least in the beginning of spicy autocomplete, this sort of role-play did work pretty dramatically at aligning a conversation to a task, though I don't think anyone ever tested it versus somewhat less cringe priming.

After that, cargo cults do what they do best.

  • > though I don't think anyone ever tested it versus somewhat less cringe priming.

    I really wonder if phrasing it differently would make a difference. In good faith conversations, it just doesn't happen that someone tells someone else who that person is.