Comment by rolisz
2 years ago
What is up with that eval @32? Am I reading it correctly that they are generating 32 responses and taking majority? Who will use the API like that? That feels like such a fake way to improve metrics
2 years ago
What is up with that eval @32? Am I reading it correctly that they are generating 32 responses and taking majority? Who will use the API like that? That feels like such a fake way to improve metrics
This also jumped out at me. It also seems that they are selectively choosing different promoting strategies too, one lists "CoT@32".
Makes it seem like they really needed to get creative to have it beat GPT4. Not a good sign imho
Page 7 of their technical report [0] has a better apples to apples comparison. Why they choose to show apples to oranges on their landing page is odd to me.
[0] https://storage.googleapis.com/deepmind-media/gemini/gemini_...
I assume these landing pages are made for wall st analysts rather than people who understand LLM eval methods.
True, but even some of the apples to apples is favorable to Gemini Ultra 90.04% CoT@32 vs. GPT-4 87.29% CoT@32 (via API).
1 reply →
Showing dominance in AI is also targeted at their entreprise customers who spend millions on Google Cloud services.