Comment by rolisz

2 years ago

What is up with that eval @32? Am I reading it correctly that they are generating 32 responses and taking majority? Who will use the API like that? That feels like such a fake way to improve metrics

6 comments

rolisz

technics256 2 years ago

This also jumped out at me. It also seems that they are selectively choosing different promoting strategies too, one lists "CoT@32".

Makes it seem like they really needed to get creative to have it beat GPT4. Not a good sign imho

bryanh 2 years ago

Page 7 of their technical report [0] has a better apples to apples comparison. Why they choose to show apples to oranges on their landing page is odd to me.

[0] https://storage.googleapis.com/deepmind-media/gemini/gemini_...

polygamous_bat 2 years ago
I assume these landing pages are made for wall st analysts rather than people who understand LLM eval methods.
- bryanh 2 years ago
  
  True, but even some of the apples to apples is favorable to Gemini Ultra 90.04% CoT@32 vs. GPT-4 87.29% CoT@32 (via API).
  
  1 reply →
- rockinghigh 2 years ago
  
  Showing dominance in AI is also targeted at their entreprise customers who spend millions on Google Cloud services.