Comment by riku_iki
11 days ago
> If gemini-3-deepthink gets above 85% on the private eval set, it will be considered "solved"
They never will do on private set, because it would mean its being leaked to google.
11 days ago
> If gemini-3-deepthink gets above 85% on the private eval set, it will be considered "solved"
They never will do on private set, because it would mean its being leaked to google.
No comments yet
Contribute on Hacker News ↗