Comment by zone411
3 days ago
The exact questions are almost certainly not in the training data, since extra words are added to each puzzle, and I don't publish these along with the original words (though there's a slight chance they used my previous API requests for training).
To guard against potential training data contamination, I separately calculate the score using only the newest 100 puzzles. Grok 4 still leads.
No comments yet
Contribute on Hacker News ↗