Gemini 3 Pro Model Card [pdf]

7 hours ago (storage.googleapis.com)

It says it's been trained from scratch. I wonder if it will have the same undescribable magic that makes me spend an hour every day with 2.5. I really love the results I can get with 2.5 pro. Google eventually limiting aistudio will be a sad day.

Also I really hoped for a 2M+ context. I'm living on the context edge even with 1M.

They scored a 31.1% on ARC AGI 2 which puts them in first place.

Also notable which models they include for comparison: Gemini 2.5 Pro, Claude Sonnet 4.5, and GPT-5.1. That seems like a minor snub against Grok 4 / Grok 4.1.

  • My impression is that Grok is very rarely used in practice outside of a niche of die-hard users, partly because of very different tuning to other models, and partly the related public reputation around it.

    https://firstpagesage.com/reports/top-generative-ai-chatbots... suggests 0.6% of chat use cases, well below the other big names, and I suspect those stats for chat are higher than other scenarios like business usage. Given all that, I can see how Gemini might not be focused on competing with them.

    • well, there are 3 kind of usages for grok: - using grok inside X/Twitter: most people interacts with Grok this way. - using grok on its website: this is really annoying, as you get delayed by cloudflare everytime you access the site. As grok does not provide serious advantage over other services, why bother - you can also use the app, but it is not as convenient as other services.

      it is understandable that grok is not popular.

    • I don’t know anyone who uses Grok, but in my peer group everyone uses 1-2 paid services like Gemini or Clause or ChatGPT. They’re probably not as “extremely online” as I am, so I can’t generalize this thought, but anecdotally my impression has been that Grok is just very “right wing influencer” coded.

  • Grok seems extremely prone to hallucination in my experience. It also constantly asserts certainty on fuzzy topics.

  • About ARC 2:

    I would want to hear more detail about prompts, frameworks, thinking time, etc., but they don't matter too much. The main caveat would be that this is probably on the public test set, so could be in pretraining, and there could even be some ARC-focussed post-training - I think we don't know yet and might never know.

    But for any reasonable setup, if no egregious cheating, that is an amazing score on ARC 2.

> The training dataset also includes: publicly available datasets that are readily downloadable; data obtained by crawlers; licensed data obtained via commercial licensing agreements; user data (i.e., data collected from users of Google products and services to train AI models, along with user interactions with the model) in accordance with Google’s relevant terms of service, privacy policy, service-specific policies, and pursuant to user controls, where appropriate; other datasets that Google acquires or generates in the course of its business operations, or directly from its workforce; and AI-generated synthetic data.

Well don't complain when you are using Gmail and your emails are being trained to develop Gemini.

  • It says "pursuant to user controls, where appropriate". We can now sleep peacefully with the knowledge that Google will give us the tools to disable this where it's not inappropriate.