Comment by zone411

7 months ago

The exact questions are almost certainly not in the training data, since extra words are added to each puzzle, and I don't publish these along with the original words (though there's a slight chance they used my previous API requests for training).

To guard against potential training data contamination, I separately calculate the score using only the newest 100 puzzles. Grok 4 still leads.