Comment by Alifatisk
17 days ago
Exciting news to see these models being released to the Gemini app, I would wish my preferences on which model I want to default to got saved for further sessions.
How many tokens can gemini.google.com handle as input? How large is the context window before it forgets? A quick search said it's 128k token window but that applies to Gemini 1.5 Pro, how is it now then?
My assumption is that "Gemini 2.0 Flash Thinking Experimental is just" "Gemini 2.0 Flash" with reasoning and "Gemini 2.0 Flash Thinking Experimental with apps" is just "Gemini 2.0 Flash Thinking Experimental" with access to the web and Googles other services, right? So sticking to "Gemini 2.0 Flash Thinking Experimental with apps" should be the optimal choice.
Is there any reason why Gemini 1.5 Flash is still an option? Feels like it should be removed as an option unless it does something better than the other.
I have difficulties understanding where each variant of the Gemini model is suited the most. Looking at aistudio.google.com, they have already update the available models.
Is "Gemini 2.0 Flash Thinking Experimental" on gemini.google.com just "Gemini experiment 1206" or was it "Gemini Flash Thinking Experimental" aistudio.google.com?
I have a note on my notes app where I rank every llm based on instructions following and math, to this day, I've had difficulties knowing where to place every Gemini model. I know there is a little popup when you hover over each model that tries to explain what each model does and which tasks it is best suited for, but these explanations have been very vague to me. And I haven't even started on the Gemini Advanced series or whatever I should call it.
The available models on aistudio is now:
- Gemini 2.0 Flash (gemini-2.0-flash)
- Gemini 2.0 Flash Lite Preview (gemini-2.0-flash-lite-preview-02-05)
- Gemini 2.0 Pro Experimental (gemini-2.0-pro-exp-02-05)
- Gemini 2.0 Flash Thinking Experimental (gemini-2.0-flash-thinking-exp-01-21)
If I had to sort these from most likely to fulfill my need to least likely, then it would probably be:
gemini-2.0-flash-thinking-exp-01-21 > gemini-2.0-pro-exp-02-05 > gemini-2.0-flash-lite-preview-02-05 > gemini-2.0-flash
Why? Because aistudio describes gemini-2.0-flash-thinking-exp-01-21 as being able to tackle most complex and difficult tasks while gemini-2.0-pro-exp-02-05 and gemini-2.0-flash-lite-preview-02-05 only differs with how much context they can handle.
So with that out of the way, how does Gemini-2.0-flash-thinking-exp-01-21 compare against o3-mini, Qwen 2.5 Max, Kimi k1.5, DeepSeek R1, DeepSeek V3 and Sonnet 3.5?
My current list of benchmarks I go through is artificialanalysis.ai, lmarena.ai, livebench.ai and aider.chat:s polygot benchmark but still, the whole Gemini suite is difficult to reason and sort out.
I feel like this trend of having many different models with the same name but different suffix starts be an obstacle to my mental model.
Update, I found that AiStudios changelog gives an explanation of where each model is best fit.
https://aistudio.google.com/app/changelog#february-5th-2025
https://aistudio.google.com/app/changelog#december-19-2024