Comment by XCSme
8 hours ago
Funnily, on my tests, 3 flash with medium reasoning does better. Seems like 3.1 pro reasoned about the correct answer, but chose to go with a different (wrong) one: https://aibenchy.com/compare/?left=google-gemini-3-flash-pre...
EDIT: while also being 3x cheaper
No comments yet
Contribute on Hacker News ↗