Comment by minimaxir

4 months ago

Which is why in the linked post, I test models against both the "r's in strawberries" and the "b's in blueberries" to see if that is the case.

tl;dr the first case had near perfect accuracy as expected for the case if the LLMs were indeed trained on it. The second case did not.