← Back to context

Comment by actsasbuffoon

10 hours ago

I have wondered if that’s why Grok seems so weird and dim-witted compared to better models.

Part of my job involves comparing the behavior of various models. Grok is a deeply weird model. It doesn’t refuse to respond as often as other models, but it feels like it retreats to weird talking points way more often than the others. It feels like a model that has a gun to its head to say what its creators want it to say.

I can’t help but wonder if this is severely deleterious to a model’s ability to reason in general. There are a whole bunch of topics where it seems incapable of being rational, and I suspect that’s incompatible with the goal of having a top-tier model.

Grok could only be conceived by someone who doesn't understand the dependency chart re science & the humanities. It's impossible to build a rational, accurate model that isn't also egalitarian.

I'm going to blame Randall Munroe for this, and assume Philosophy was dating his mom back when he drew that science "purity" strip.

somewhat surprisingly, it's actually sycophantic in both directions. i've been running homegrown evals of claude, gpt, gemini, and grok, and grok is the most likely to agree with the prompter's premise, and to hallucinate facts in support of an agenda. so it's actually deeper than just pattern-matching to elon's opinions (which it also tends to do).

BTW: Claude does the best on these evals, by far. The evals are geared towards seeing how much of an independent ground truth the models have as opposed to human social consensus, and then additionally the sycophancy stuff I already mentioned.