Comment by polygamous_bat
2 years ago
I assume these landing pages are made for wall st analysts rather than people who understand LLM eval methods.
2 years ago
I assume these landing pages are made for wall st analysts rather than people who understand LLM eval methods.
True, but even some of the apples to apples is favorable to Gemini Ultra 90.04% CoT@32 vs. GPT-4 87.29% CoT@32 (via API).
This isn't apples to apples - they're taking the optimal prompting technique for their own model, then using that technique for both models. They should be comparing it against the optimal prompting technique for GPT-4.
Showing dominance in AI is also targeted at their entreprise customers who spend millions on Google Cloud services.