Comment by caesil
2 years ago
If you think eval numbers mean a model is close to 4, then you clearly haven't been scarred by the legions of open source models which claim 4-level evals but clearly struggle to actually perform challenging work as soon as you start testing
Perhaps Gemini is different and Google has tapped into their own OpenAI-like secret sauce, but I'm not holding my breath
No comments yet
Contribute on Hacker News ↗