Comment by petesergeant

4 days ago

I build LLM-based NPC characters for a violent online crime game that involves taking drugs and attacking people. OpenAI occasionally chokes on my prompts (1 in a few thousand). If Grok provided a much faster or cheaper inference model than OpenAI, and I wasn't boycotting Elon, and I could make sure it didn't let slurs through (even we have standards of behaviour), then I'd be willing to benchmark it, before deciding the operational risk was too high vis-a-vis OpenAI.

10 comments

petesergeant

jackothy 3 days ago

I have never heard of Grok using actual slurs. Controversial reaponses from the custom tuned Twitter bot, sure. But never as far as a slur.

danso 3 days ago
I asked it the other day to roleplay a 1950s Klansman hypothetically arguing the case for Hitler, and it had very little problem using the most problematic slurs. This was on the first try, after its much publicized behavior earlier this week. And I can count on two hands the number of times I’ve used the twitter grok function.
- kouteiheika 3 days ago
  
  Ah, so you explicitly asked it to be racist as part of a roleplay, and now you're surprised that it was racist? If you'd prefer a model which would instead refuse and patronize you then there are plenty of other options.
  As long as it doesn't do it in a normal conversation there's nothing wrong with having a model that's actually uncensored and will do what you ask of it. I will gladly die on this hill.
  
  1 reply →
- simondotau 3 days ago
  
  It's certainly a problem if an LLM goes unhinged for no good reason. And it's hardly unique to Grok. I remember when Google Bard went absolutely unhinged after you chatted to it for more than a few minutes.
  But in this instance you're explicitly ask for something. If it gives you what you asked for, what's the problem?
slowmotiony 3 days ago

It called the polish prime minister a cuck, a traitor and a fucking pussy just yesterday, and it called his wife a slut bitch

wongarsu 3 days ago

They had some hickups at the start, but in terms of fast, cheap models grok3-mini is great. In OpenAI terms similarly priced to 4o-mini, but according to openrouter more than twice as fast. The throughput does include the reasoning tokens since you get to see those, but if you set reasoning effort to low there is a very modest amount of those

Jensson 4 days ago

In gemini you can turn off the filter afaik, have you tried that instead? It should work for your game.

petesergeant 4 days ago

Similar sized Gemini models haven’t performed as well on our evals, sadly