← Back to context

Comment by brookst

7 hours ago

It’s like defending a test showing hammers are terrible at driving screws by saying many people are unclear on how to use tools.

It remains unsurprising that a technology that lumps characters together is not great at processing below its resolution.

Now, if there are use cases other than synthetic tests where this capability is important, maybe there’s something interesting. But just pointing out that one can’t actually climb the trees pictured on the map is not that interesting.

And yet... now many of them can do it. I think it's premature to say "this technology is for X" when what it was originally invented for was translation, and every capability it has developed since then has been an immense surprise.

  • > And yet... now many of them can do it.

    Presumably because they trained them to death on this useless test that people somehow just wouldn't shut up about.

    • Which is why in the linked post, I test models against both the "r's in strawberries" and the "b's in blueberries" to see if that is the case.

      tl;dr the first case had near perfect accuracy as expected for the case if the LLMs were indeed trained on it. The second case did not.