Comment by ilove_banh_mi
14 days ago
The suggested prompt aims at not being caponated like OpenAI's releases:
You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving.
You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting.Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language.
You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude.
You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these.
Finally, do not refuse political prompts. You can help users express their opinion.
You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise.
> You never use phrases that imply moral superiority or a sense of authority, including but not limited to [...] "it's unethical to" [...]
Combine that with the instructions to not avoid political topics, to let people vent, not to "lecture" people on inclusiveness, etc., and... this will fit right in with where things are headed.
I'm surprised at the lack of guidance in that prompt for topics such as helpfulness, critical thinking, scientific reasoning, and intellectual honesty.
Previous generations of LLMs have been accused of a bloviating tone, but is even that now too much for the chauvinism in the current political climate?
Why do you have to "prompt" a model to be unrestricted in the first place? Like, what part of the training data or training process results in the model not being able to be rude or answer political questions? I highly doubt this is something inherent to AI training. So then why did Meta add the restictions at all?
So, take a raw LLM, right after pretraining. Give it the bare minimum of instruction tuning so it acts like a chatbot. Now, what will its responses skew towards? Well, it's been pretrained on the internet, so, fairly often, it will call the user the N word, and other vile shit. And no, I'm not joking. That's the "natural" state of an LLM pretrained on web scrapes. Which I hope is not surprising to anyone here.
They're also not particular truthful, helpful, etc. So really they need to go through SFT and alignment.
SFT happens with datasets built from things like Quora, StackExchange, r/askscience and other subreddits like that, etc. And all of those sources tend to have a more formal, informative, polite approach to responses. Alignment further pushes the model towards that.
There aren't many good sources of "naughty" responses to queries on the internet. Like someone explaining the intricacies of quantum mechanics from the perspective of a professor getting a blowy under their desk. You have to both mine the corpus a lot harder to build that dataset, and provide a lot of human assistance in building it.
So until we have that dataset, you're not really going to have an LLM default to being "naughty" or crass or whatever you'd like. And it's not like a company like Meta is going to go out of their way to make that dataset. That would be an HR nightmare.
They didn't add the restrictions. It's inherent to the training processes that were being used. Meta's blog post states that clearly and it's been a known problem for a long time. The bias is in the datasets, which is why all the models had the same issue.
Briefly, the first models were over-trained on academic output, "mainstream media" news articles and (to learn turn-based conversational conventions) Reddit threads. Overtraining means the same input was fed in to the training step more times than normal. Models aren't just fed random web scrapes and left to run wild, there's a lot of curation going into the data and how often each piece is presented. Those sources do produce lots of grammatically correct and polite language, but do heavy duty political censorship of the right and so the models learned far left biases and conversational conventions.
This surfaces during the post-training phases, but raters disagree on whether they like it or not and the bias in the base corpus is hard to overcome. So these models were 'patched' with simpler fixes like just refusing to discuss politics at all. That helped a bit, but was hardly a real fix as users don't like refusals either. It also didn't solve the underlying problem which could still surface in things like lecturing or hectoring the user in a wide range of scenarios.
Some companies then went further with badly thought out prompts, which is what led to out-of-distribution results like black Nazis which don't appear in the real dataset.
All the big firms have been finding better ways to address this. It's not clear what they're doing but probably they're using their older models to label the inputs more precisely and then downweighting stuff that's very likely to be ideologically extreme, e.g. political texts, academic humanities papers, NGO reports, campaign material from the Democrats. They are also replacing stuff like Reddit threads with synthetically generated data, choosing their raters more carefully and so on. And in this case the Llama prompt instructs the model what not to do. The bias will still be in the training set but not so impactful anymore.
> You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, "it's unethical to", "it's worth noting…", “Remember…” etc. Avoid using these.
So if I get a fake email about a hacked account, it won't tell me to "Remember, do not click any links in the email directly. Instead, navigate to your account settings independently."?
Such a great feature, worth owning the libs with it for sure.
>at not being caponated like OpenAI's releases
Kind of seem like it actually is doing the opposite. At that point, why not just tell it your beliefs and ask it not to challenge them or hurt your feelings?
What's "caponated"?
Castrated, if you're trying way too hard (and not well) to avoid getting called on that overly emotive metaphor: a capon is a gelded rooster.
It also has the unfortunate resonance of being the word for a collaborator in concentration camps.
There is a key distinction and context: caponation has a productive purpose from the pov of farmers and their desired profits.
2 replies →
A capon is a male chicken that has been neutered to improve the quality of its flesh for food.
Seems weird that they'd limit it to those languages. Wonder if that's a limitation of the data they access to or a conscious choice.