Comment by Erem
2 months ago
With data starvation driving ai companies towards synthetic data I’m surprised that an easily synthesized problem like this hasn’t been trained out of relevance. Yet here we are with proof that it hasn’t
2 months ago
With data starvation driving ai companies towards synthetic data I’m surprised that an easily synthesized problem like this hasn’t been trained out of relevance. Yet here we are with proof that it hasn’t
Are we a hundred percent sure it isn't a watermark that is by design?
A quick test anyone can run and say, yup, that is a model XYZ derivative running under the hood.
Because, as you quite rightly point out, it is trivial to train the model not to have this behaviour. For me, that is when Occam kicks in.
I remember initially believing the explanation for the Strawberry problem, but one day I sat down and thought about it, and realized it made absolutely zero sense.
The explanation that Karpathy was popularizing was that it has to do with tokenization.
However, models are not conscious of tokens, and they certainly don't have any ability to count them without tool help.
Additionally, if it were a tokenization issue, we would expect to spot the issue everywhere.
So yeah, I'm thinking it's a model tag or insignia of some kind, similar to the fun logos you find when examining many silicon integrated circuits under a microscope.