Comment by dubcanada

17 hours ago

grok is 17%? And that's the lowest, most models are like 80%+?

While hallucination is probably closer to 100% depending on the question. This benchmark makes no sense.

8 comments

dubcanada

> While hallucination is probably closer to 100% depending on the question.

But the benchmark didn't ask those questions, and it seems grok is very well at saying it doesn't know the answer otherwise.

elAhmo 17 hours ago

No one serious uses grok.

ajdegol 16 hours ago
@grok is this true?
- for_i_in_range 33 minutes ago
  
  This comment deserves more love
- NamlchakKhandro 9 hours ago
  
  no
RALaBarge 15 hours ago

YMMV but Grok 4.1 Fast can usually find via static analysis a few things that other models dont seem to catch with the same prompt
d0gsg0w00f 11 hours ago

Why not? Honest question.

MagicMoonlight 3 hours ago

It makes sense. Grok is taught to answer the question, regardless of how explicit or extreme it is. These other models are taught to suppress any wrongthink. That's going to make it hard to answer things correctly. If you've been told to answer something incorrectly because it's wrong, then you'll have to make up an answer.