← Back to context

Comment by TheAceOfHearts

4 days ago

Does anyone here have access to Grok 4 yet? If so, could you please try asking it to solve this basic word search problem [0] and share the results? It's just a simple grid of letters where you have to find the position of each word, the kind of problem that any young child can easily solve.

[0] https://imgur.com/VxNP5jG

They said they're training a new base model for better multimodal performance soon. I wouldn't expect it to be able to read an image like that today. Maybe if you provided it in text format.

  • As a point of interest and for comparison, Gemini 2.5 Pro is able to generate a Python program that outputs the complete correct solution when run, but it can't figure out how to one-shot the problem if asked directly.

    This is just a for-fun test to get a sense of how models are progressing; it highlights the jagged nature of their intelligence and capabilities. None of the big AI labs are testing for such a basic problem type, which makes it a bit of an interesting check.

    I think it's still interesting to see how Grok 4 performs, even if we don't use this test to draw any broader conclusions about what capabilities it offers.

  • description from openrouter:

    > Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not exposed, reasoning cannot be disabled, and the reasoning effort cannot be specified.

    unfortunately no requests are passing because of some rate limits

These models are not trained on character level input. Why would anyone expect them to perform well on character level puzzles?

  • They are trained on many billions of tokens of text dealing with character level input, they would be rather dumb if they couldn't learn it anyway.

    Every human learns that, when you hear the sound "strawberry" you don't hear the double r there, yet you still know the answer.

    • These models operate on tokens, not characters. It’s true that training budgets could be spent on exhaustively enumerating how many of each letter are in every word in every language, but it’s just not useful enough to be worth it.

      It’s more like asking a human for the Fourier components of how they pronounce “strawberry”. I mean the audio waves are right there, why don’t you know?

      15 replies →