← Back to context

Comment by rolisz

2 years ago

What is up with that eval @32? Am I reading it correctly that they are generating 32 responses and taking majority? Who will use the API like that? That feels like such a fake way to improve metrics

This also jumped out at me. It also seems that they are selectively choosing different promoting strategies too, one lists "CoT@32".

Makes it seem like they really needed to get creative to have it beat GPT4. Not a good sign imho

Page 7 of their technical report [0] has a better apples to apples comparison. Why they choose to show apples to oranges on their landing page is odd to me.

[0] https://storage.googleapis.com/deepmind-media/gemini/gemini_...

  • I assume these landing pages are made for wall st analysts rather than people who understand LLM eval methods.

    • Showing dominance in AI is also targeted at their entreprise customers who spend millions on Google Cloud services.