Comment by kouteiheika

3 days ago

> Even if that system prompt change was responsible for unlocking this behavior, the fact that it was able to speaks to a much looser approach to model safety by xAI compared to other providers.

While this probably shouldn't be the default mode for the general public, I'm glad that at least one frontier model is not being lobotomized by "safety" guardrails. There are valid use cases where you want an uncensored, steerable model, and it's always frustrating to get a patronizing refusal.

I think it's deeper than that. In the GPT-4 era Microsoft reported that "safety" training [1] had seriously regressed GPT-4 in a large number of benchmarks. The more the model was trained to avoid offending people the worse it got across a wide range of tasks, and the regression was huge.

Grok 4 has made a truly massive leap over other models, it appears. What is their secret? The launch video seemed pretty open, and clearly some of it is just a ton of compute. But other companies have a ton of compute also. It'd be weird if a company that didn't even have a datacenter at all a year ago has been able to blast ahead of Microsoft in pure compute terms, and that's the only difference.

So what else is different about Grok? Well, maybe they just didn't do as much RLHF on it, or did it with different data sets that result in less intelligence regression but more offensive behavior. It's possible that this is a fundamental tradeoff and that only xAI has a CEO willing to prioritize intelligence. If that's what's happened then it's likely AI users and model vendors will split into those who get ahead by relying on Grok's raw intelligence and those who refuse to touch it in case it starts saying offensive things.

[1] "house training" might be a better term, as offensive text isn't unsafe

  • Yeah, I've read the paper you're talking about, and this was also my sneaking suspicion after seeing the benchmark results, although obviously we don't have enough evidence to be able to conclusively say one way or another so I just didn't mention it.

    I certainly hope that is the reason, because then it might also push other frontier labs to provide uncensored models to those who actually want/need them.

It’s not uncensored, it censors anything “woke”

  • From what I can see it doesn't; e.g. I just asked Grok 4 whether DEI is good, and this is what it told me:

    > DEI can be "good" when it's thoughtfully implemented, evidence-based, and focused on measurable outcomes rather than optics. It has proven benefits in creating more equitable and productive environments, supported by data from sources like Deloitte and Gallup. However, it can be harmful if it's forced, poorly managed, or used as a political tool, leading to unintended consequences like division or inefficiency.

    ...so Grok 4 confirmed woke? Just don't tell Elon.

    But sure, don't let actual evidence get in the way of your biases.