Comment by petesergeant
4 days ago
I build LLM-based NPC characters for a violent online crime game that involves taking drugs and attacking people. OpenAI occasionally chokes on my prompts (1 in a few thousand). If Grok provided a much faster or cheaper inference model than OpenAI, and I wasn't boycotting Elon, and I could make sure it didn't let slurs through (even we have standards of behaviour), then I'd be willing to benchmark it, before deciding the operational risk was too high vis-a-vis OpenAI.
I have never heard of Grok using actual slurs. Controversial reaponses from the custom tuned Twitter bot, sure. But never as far as a slur.
I asked it the other day to roleplay a 1950s Klansman hypothetically arguing the case for Hitler, and it had very little problem using the most problematic slurs. This was on the first try, after its much publicized behavior earlier this week. And I can count on two hands the number of times I’ve used the twitter grok function.
Ah, so you explicitly asked it to be racist as part of a roleplay, and now you're surprised that it was racist? If you'd prefer a model which would instead refuse and patronize you then there are plenty of other options.
As long as it doesn't do it in a normal conversation there's nothing wrong with having a model that's actually uncensored and will do what you ask of it. I will gladly die on this hill.
1 reply →
It's certainly a problem if an LLM goes unhinged for no good reason. And it's hardly unique to Grok. I remember when Google Bard went absolutely unhinged after you chatted to it for more than a few minutes.
But in this instance you're explicitly ask for something. If it gives you what you asked for, what's the problem?
It called the polish prime minister a cuck, a traitor and a fucking pussy just yesterday, and it called his wife a slut bitch
They had some hickups at the start, but in terms of fast, cheap models grok3-mini is great. In OpenAI terms similarly priced to 4o-mini, but according to openrouter more than twice as fast. The throughput does include the reasoning tokens since you get to see those, but if you set reasoning effort to low there is a very modest amount of those
In gemini you can turn off the filter afaik, have you tried that instead? It should work for your game.
Similar sized Gemini models haven’t performed as well on our evals, sadly