Comment by mkl

14 days ago

That Gemini 2.5 one is impressive. I found it interesting that the blog post didn't mention Gemini 2.5 at all. Okay, it was released pretty recently, but 10 days seems like enough time to run the benchmarks, so maybe the results make Llama 4 look worse?

I'm sure it does, as Gemini 2.5 Pro has been making every other model look pretty bad.

Meta will most likely compare against it when they release the upcoming Llama 4 reasoning model.