What is the model I get at gemini.google.com (i.e. through my Workspace subscription)? It says "Gemini Advanced" but there are no other details. No model selection option.
I find the lack of clarity very frustrating. If I want to try Google's "best" model, should I be purchasing something? AI Studio seems focused around building an LLM wrapper app, but I just want something to answer my questions.
Edit: what I've learned through Googling: (1) if you search "is gemini advanced included with workspace" you get an AI overview answer that seems to be incorrect, since they now include Gemini Advanced (?) with every workspace subscription.(2) a page exists telling you to buy the add-on (Gemini for Google Workspace), but clicking on it says this is no longer available because of the above. (3) gemini.google.com says "Gemini Advanced" (no idea which model) at the top, but gemini.google.com/advanced redirects me to what I have deduced is the consumer site (?) which tells me that Gemini Advanced is another $20/month
The problem, Google PMs if you're reading this, is that the gemini.google.com page does not have ANY information about what is going on. What model is this? What are the limits? Do I get access to "Deep Research"? Does this subscription give me something in aistudio? What about code artifacts? The settings option tells me I can change to dark mode (thanks!).
Edit 2: I decided to use aistudio.google.com since it has a dropdown for me on my workspace plan.
This works on my personal Google account, but not on my workspace one. So I guess there's no access to 2.0 Pro then? I'm ok trying out Flash for now and see if it fixes the mistakes I ran into yesterday.
Edit: it does not. It continues to miss the fact that I'm (incorrectly) passing in a scaled query tensor to scaled_dot_product_attention. o3-mini-high gets this right.
This is funny how bad UI is on some of websites which are considered the best. Today I tried to find prices for Mistral models but I couldn’t. Their prices page leads to 404…
Just in case you're still interested in their pricing, it's towards the bottom of [1], section "How to buy", when changing the selection from "Self-hosted" to "Mistral Cloud".
Google workspace is always such a mess. I also have Google workspace, and it did let me do some chatting in the Gemini app few days ago. No idea what model, and of course there was no dropdown.
Just today I wanted to continue a conversation from two days ago, and after writing to the chat, I just get back an error “This chat was created with Gemini Advanced. Get it now to continue this chat.”
And I don’t even know if that’s a bug, or some expected sales funnel where they gave me a nibble of it for free and now want me to pay up.
The number one reason I don't use Google Gemini is because they truncate the input text. So I can't simply paste long documents or other kinds of things as raw text in the prompt box.
Today I wasted 1 hour looking in how to use or where to find "Deep Research”.
I could not. I have the business workplace standard, which contains the Gemini advance, not sure whether I need a VPN, pay a separate AI product, or even pay a higher workplace tier or what the heck is going on at all.
There are so many confusing products interrelated and lack of focus everywhere that I really do not know anymore whether it is worth as an AI provider.
To be fair, Microsoft has shipped like five AI portals in the last two years. Maybe four — I don’t even know any more. I’ve lost track of the renames and product (re)launches.
Working with google APIs is often an exercise in frustration. I like their base cloud offering the best actually, but their additional APIs can be all over the place. These AI related are the worst.
Google is the least confusing to me. Old school version number and Pro is better than Flash which is fast and for "simple" stuff (which can be effortless intermediate level coding at this point).
OpenAI is crazy. There may be a day when we might have o5 that is reasoning and 5o that is not, and where they belong to different generations too, snd where "o" meant "Omni" despite o1-o3 not being audiovisual anymore like 4o.
Anthropic crazy too. Sonnets and Haikus, just why... and a 3.5 Sonnet that was released in October that was better than 3.5 Sonnet. (Not a typo) And no one knows why there never was a 3.5 Opus.
Yeah some of us have been working on agents predominately for years now, but at least people are finally paying attention. Can't wait to be told how I'm following a hype cycle again.
I tried voice chat. It's very good, except for the politics
We started talking about my plans for the day, and I said I was making chili. G asked if I have a recipe or if I needed one. I said, I started with Obama's recipe many years ago and have worked on it from there.
G gave me a form response that it can't talk politics.
Oh, I'm not talking politics, I'm talking chili.
G then repeated form response and tried to change conversation, and as long as I didn't use the O word, we were allowed to proceed. Phew
I find it horrifying and dystopian that the part where it "Can't talk politics" is just accepted and your complaint is that it interrupts your ability to talk chilli.
"Go back to bed America." "You are free, to do as we tell you"
Online the idea of "no politics" is often used as a way to try to stifle / silence discussion too. It's disturbingly fitting to the Gemini example.
I was a part of a nice small forum online. Most posts were everyday life posts / personal. The person who ran it seemed well meaning. Then a "no politics" rule appeared. It was fine for a while. I understood what they meant and even I only want so much outrage in my small forums.
Yet one person posted about how their plans to adopt were in jeopardy over their state's new rules about who could adopt what child. This was a deeply important and personal topic for that individual.
As you can guess the "no politics" rule put a stop to that. The folks who supported laws like were being proposed of course thought that they shouldn't discuss it because it is "politics", others felt that this was that individual talking about their rights and life, it wasn't "just politics". Whole forum fell apart after that debacle.
Gemini's response here is sadly fitting internet discourse... in bad way.
There's nothing wrong with (and in fact much to be said in favor of) a "no politics" rule. When I was growing up it was common advice to not discuss politics/religion in mixed company. At one point I thought that was stupid fuddy-duddy advice, because people are adults and can act reasonably even if they disagree. But as I get older, I realize that I was wrong: people really, really can't control their emotions when politics comes up and it gets ugly. Turns out that the older generation was correct, and you really shouldn't talk politics in mixed company.
Obviously in this specific case the user isn't trying to talk politics, but the rule isn't dystopian in and of itself. It's simply a reflection of human nature, and that someone at Google knows it's going to be a lot of trouble for no gain if the bot starts to get into politics with users.
There has to be a better way about it. As I see it, to be productive, AI agents have to be able to talk about politics, because at the end of the day politics are everywhere. So following up on what they do already, they'll have to define a model's political stance (whatever it is), and to have it hold its ground, voicing an opinion or abstaining from voicing an opinion, but continuing the conversation, as a person would (at least as those of us who don't rage-quit a conversation when they hear something slightly controversial).
I agree it's ridiculous that the mention of a politician triggers the block so feels overly tightened (which is the story of existencer for Gemini), but the alternative is that the model will have the politics of it's creators/trainers. Is that preferable to you? (I suppose that depends on how well your politics align with Silicon Valley)
Eh, OP isn't stopped from talking politics, Gemini('s owner, Google) is merely exercising its right to avoid talking about politics with OP. That said, the restriction seems too tight, since merely mentioning Obama ought not count as "politics". From a technical perspective that should be fixed.
OP can go talk politics until he's blue in the face with someone willing to talk politics with them.
One plus is that’s usually less pointless noise this way.
If the model says “sorry, no politics, let’s talk about something else” - there’s a tiny fraction of a minority will make a comment like you did and be done with it. We can all move on.
If the model responds as neutrally as possible, maybe “Obama’s chilli is a great recipe, let me know when you want to begin”, we end up with ENDLESS clutching of pearls, WOKE MODEL SUGGESTS LEFT WING CHILLI BETTER THAN RIGHT WING CHILLI!!! CANCEL LEFTIST GOOGLE!!!
And then the bit that actually bugs me, just to stir up some drama you’ll get the occasional person who absolutely knows better and knows exactly what they’re doing: “I’m Professor MegaQualifications, and actually I will show that those who have criticised the models as leftist are being shut down[1] and ignored but the evidence shows they have a point…”
[1] always said unironically at a time when it’s established as a daily recurring news story being rammed down our throats because it’s one of those easy opinion generators that sells engagement like few other stories outside of mass bloodshed events
I find it kind of useless due to the no politics and I usually quickly lose my patience with it. Same with DeepSeek. Meanwhile you can have a decent conversation with Mistral, Claude, pi.ai and other LLMs. Even Chat GPT, although the patronizing appologizing tone is annoying.
These names are unbelievably bad. Flash, Flash-Lite? How do these AI companies keep doing this?
Sonnet 3.5 v2
o3-mini-high
Gemini Flash-Lite
It's like a competition to see who can make the goofiest naming conventions.
Regarding model quality, we experiment with Google models constantly at Rev and they are consistently the worst of all the major players. They always benchmark well and consistently fail in real tasks. If this is just a small update to the gemini-exp-1206 model, then I think they will still be in last place.
as a person who thought they were arbitrary names when i first discovered them and spent an hour trying to figure out the difference i disagree. it gets even more confusion when you realize that opus, which according to their silly naming scheme is supposed to be the biggest and best model they offer is seemingly abandoned and that title has been given to sonnet which is supposed to be the middle of the road model.
You can have a live conversation with Gemini and have the model see the world via your phone camera (or see your desktop via screenshare on the web), and talk about it. It's quite a cool experience! It made me feel the joy of programming and using computers that I had had so many times before.
Except Gemini multimodal outputs are still under lock and key except for a select few.
Very disappointing to see the claim Gemini 2.0 is available for everyone when it's simply not. Seems like Google is following the OpenAI playbook on this.
For anyone that parsing PDF's this is a game changer in term of price per dollar - I wrote a blog about it [1]. I think a lot of people were nervous about pricing since they released the beta, and although it's slightly more expensive than 1.5 Flash, this is still incredibly cost-effective. Looking forward to also benchmarking the lite version.
I upgraded my llm-gemini plugin to handle this, and shared the results of my "Generate an SVG of a pelican riding a bicycle" benchmark here: https://simonwillison.net/2025/Feb/5/gemini-2/
The pricing is interesting: Gemini 2.0 Flash-Lite is 7.5c/million input tokens and 30c/million output tokens - half the price of OpenAI's GPT-4o mini (15c/60c).
Gemini 2.0 Flash isn't much more: 10c/million for text/image input, 70c/million for audio input, 40c/million for output. Again, cheaper than GPT-4o mini.
Is there a way to see/compare the shared results for all of the LLMs you've tested this prompt on in one place? The 2.0 pro result seems decent but I don't have a baseline if that's because it is or if the other 2 are just "extremely bad" or something.
I've been very impressed by Gemini 2.0 Flash for multimodal tasks, including object detection and localization[1], plus document tasks. But the 15 requests per minute limit was a severe limiter while it was experimental. I'm really excited to be able to actually _do_ things with the model.
In my experience, I'd reach for Gemini 2.0 Flash over 4o in a lot of multimodal/document use cases. Especially given the differences in price ($0.10/million input and $0.40/million output versus $2.50/million input and $10.00/million output).
That being said, Qwen2.5 VL 72B and 7B seem even better at document image tasks and localization.
Mostly because OpenAI's vision offerings aren't particularly compelling:
- 4o can't really do localization, and ime is worse than Gemini 2.0 and Qwen2.5 at document tasks
- 4o mini isn't cheaper than 4o for images because it uses a lot of tokens per image compared to 4o (~5600/tile vs 170/tile, where each tile is 512x512)
- o1 has support for vision but is wildly expensive and slow
- o3-mini doesn't yet have support for vision, and o1-mini never did
I use all top of the line models everyday. Not for coding, but for general "cognitive" tasks like research, thinking, analysis, writing etc. What Google calls Gemini Pro 2.0 has been my most favorite model for the past couple of months. I think o1/4o come pretty close. Those are kinda equals, with a slight preference for Gemini. Claude has fallen behind, clearly. DeepSeek is intriguing. It excels occassionally where others won't. For consistency's sake, Gemini Pro 2.0 is amazing.
I highly recommend using it via https://aistudio.google.com/. Gemini app has some additional bells and whistles, but for some reason quality isn't always on par with aistudio. Also Gemini app seems to have more filters -- it seems more shy answering controversial topics. Just some general impressions.
2.0 Pro Experimental seems like the big news here?
> Today, we’re releasing an experimental version of Gemini 2.0 Pro that responds to that feedback. It has the strongest coding performance and ability to handle complex prompts, with better understanding and reasoning of world knowledge, than any model we’ve released so far. It comes with our largest context window at 2 million tokens, which enables it to comprehensively analyze and understand vast amounts of information, as well as the ability to call tools like Google Search and code execution.
It's not that big of a news because they already had gemini-exp-1206 on the API - they just didn't say it was Gemini 2.0 Pro until today. Now the AI Studio marks it as 2.0 Pro Experimental - basically an older snapshot, the newer one is gemini-2.0-pro-exp-02-05.
The increase is likely because 1.5 Flash was actually cheaper than all other STT services. I wrote about this a while ago at https://ktibow.github.io/blog/geminiaudio/.
I feel that the audio interpreting aspects of the Gemini models aren't just STT. If you give it something like a song, it can give you information about it.
I don't know what they mean by this but the obvious interpretation is not true. It understands other languages, it even does really well with low representation languages, in my case Latvian.
That 1M tokens context window alone is going to kill a lot of RAG use cases. Crazy to see how we went from 4K tokens context windows (2023 ChatGPT-3.5) to 1M in less than 2 years.
We have heard this before when 100k and 200k were first being normalized by Anthropic way back when and I tend to be skeptical in general when it comes to such predictions, but in this case, I have to agree.
Having used the previews for the last few weeks with different tasks and personally designed challenges, what I found is that these models are not only capable of processing larger context windows on paper, but are also far better at actually handling long, dense, complex documents in full. Referencing back to something upon specific request, doing extensive rewrites in full whilst handling previous context, etc. These models also have handled my private needle in haystack-type challenges without issues as of yet, though those have been limited to roughly 200k in fairness. Neither Anthropics, OpenAIs, Deepseeks or previous Google models handled even 75k+ in any comparable manner.
Cost will of course remain a factor and will keep RAG a viable choice for a while, but for the first time I am tempted to agree that someone has delivered a solution which showcases that a larger context window can in many cases work reliably and far more seemlessly.
Is also the first time a Google model actually surprised me (positively), neither Bard, nor AI answers or any previous Gemini model had any appeal to me, even when testing specificially for what other claimed to be strenghts (such as Gemini 1.5s alleged Flutter expertise which got beaten by both OpenAI and Anthropics equivalent at the time).
That's not really my experience. Error rate goes up the more stuff you cram into the context, and processing gets both slower and more expensive with the amount of input tokens.
I'd say it makes sense to do RAG even if your stuff fits into context comfortably.
If a model has 4K input context and you have a document or code base with 40K, then you have to split it up. The system prompt, user prompt, and output token budget all eat into this. You might need hundreds of small pieces, which typically end up in a vector database for RAG retrieval.
With a million tokens you can shove several short books into the prompt and just skip all that. That’s an entire small-ish codebase.
A colleague used a HTML dump of every config and config policy from a Windows network, pasted it into Gemini and started asking questions. It’s just that easy now!
Maybe someone knows, what's the usual recommendation regarding big context windows? Is it safe to use it to the max, or performance will degrade and we should adapt the maximum to our use case?
Gemini 2.0 works great with large context. A few hours ago, I posted a ShowHN about parsing an entire book in a single prompt. The goal was to extract characters, relationships, and descriptions that could then be used for image generation:
Not sure. I am using models/API keys from https://aistudio.google.com. They just added new models, e.g., gemini-2.0-pro-exp-02-05. Exp models are free of charge with some daily quota depending on model.
Updates for Gemini models will always be exciting to me because of how generous free API tier is, I barely run into limits for personal use. Huge context window is a huge advantage for use in personal projects, too
I have a fun query in AI studio where I pasted a 800,000 token Wuxia martial arts novel and ask it worldbuilding questions.
1.5 pro and the old 2.0 flash experimental generated responses in AI studio but the new 2.0 models respond with blank answers.
I wonder if it's timing out or some sort of newer censorship models is preventing 2.0 from answering my query. The novel is pg-13 at most but references to "bronze skinned southern barbarians" "courtesans" "drugs" "demonic sects" and murder could I guess set it off.
I don't know what those "needle in haystack" benchmarks are testing for because in my experience dumping a big amount of code in the context is not working as you'd expect. It works better if you keep the context small
I think the sweet spot is to include some context that is limited to the scope of the problem and benefit from the longer context window to keep longer conversations going. I often go back to an earlier message on that thread and rewrite with understanding from that longer conversation so that I can continue to manage the context window
Claude works well for me loading code up to around 80% of its 200K context and then asking for changes. If the whole project can't fit I try to at least get in headers and then the most relevant files. It doesn't seem to degrade. If you are using something like an AI IDE a lot of times they don't really get the 200K context.
I have seen Gemini hallucinate ridiculous bugs in a file that had less than 1000 LOC when I was scratching my head over what was wrong. The issue turned out to be that the cbBLAS matrix multiplication functions expected column major indexing while the code expected row major indexing.
They don't need to sue, they'll just ban your account with no warning or explanation the moment you get on their radar.. at least that's what they did to me.
I am dreaming about an aggregator site where one can select any model - openai, gemini, claude, llama, qwen... and pay API rate + some profit margin for any query. Without having to register with each AI provider, and without sharing my PII with them.
I am blown away by how bad this is compared to the competition. What is Google doing? I asked it to give me something simple like 6 results, and it gives me 3, not to mention bad data hallucinations that ChatGPT and others are working fine with.
Worth noting that with 2.0 they're now offering free search tool use for 1,500 queries per day.
Their search costs 7x Perplexity Sonar's but imagine a lot of people will start with Google given they can get a pretty decent amount of search for free now.
it sucks btw. I tried scheduling an event in google calendar through gemini, and it got the date wrong, the time wrong, and the timezone wrong. it set an event that's supposed to be tomorrow to next year.
Exciting news to see these models being released to the Gemini app, I would wish my preferences on which model I want to default to got saved for further sessions.
How many tokens can gemini.google.com handle as input? How large is the context window before it forgets? A quick search said it's 128k token window but that applies to Gemini 1.5 Pro, how is it now then?
My assumption is that "Gemini 2.0 Flash Thinking Experimental is just" "Gemini 2.0 Flash" with reasoning and "Gemini 2.0 Flash Thinking Experimental with apps" is just "Gemini 2.0 Flash Thinking Experimental" with access to the web and Googles other services, right? So sticking to "Gemini 2.0 Flash Thinking Experimental with apps" should be the optimal choice.
Is there any reason why Gemini 1.5 Flash is still an option? Feels like it should be removed as an option unless it does something better than the other.
I have difficulties understanding where each variant of the Gemini model is suited the most. Looking at aistudio.google.com, they have already update the available models.
Is "Gemini 2.0 Flash Thinking Experimental" on gemini.google.com just "Gemini experiment 1206" or was it "Gemini Flash Thinking Experimental" aistudio.google.com?
I have a note on my notes app where I rank every llm based on instructions following and math, to this day, I've had difficulties knowing where to place every Gemini model. I know there is a little popup when you hover over each model that tries to explain what each model does and which tasks it is best suited for, but these explanations have been very vague to me. And I haven't even started on the Gemini Advanced series or whatever I should call it.
The available models on aistudio is now:
- Gemini 2.0 Flash (gemini-2.0-flash)
- Gemini 2.0 Flash Lite Preview (gemini-2.0-flash-lite-preview-02-05)
- Gemini 2.0 Pro Experimental (gemini-2.0-pro-exp-02-05)
Why? Because aistudio describes gemini-2.0-flash-thinking-exp-01-21 as being able to tackle most complex and difficult tasks while gemini-2.0-pro-exp-02-05 and gemini-2.0-flash-lite-preview-02-05 only differs with how much context they can handle.
So with that out of the way, how does Gemini-2.0-flash-thinking-exp-01-21 compare against o3-mini, Qwen 2.5 Max, Kimi k1.5, DeepSeek R1, DeepSeek V3 and Sonnet 3.5?
My current list of benchmarks I go through is artificialanalysis.ai, lmarena.ai, livebench.ai and aider.chat:s polygot benchmark but still, the whole Gemini suite is difficult to reason and sort out.
I feel like this trend of having many different models with the same name but different suffix starts be an obstacle to my mental model.
Update, I found that AiStudios changelog gives an explanation of where each model is best fit.
Gemini 2.0 Flash is just Googles production ready model
Gemini 2.0 Flash-Lite Preview is their smallest model for high volume tasks
Gemini 2.0 Pro Experimental is the strongest Gemini model
Gemini 2.0 Flash Thinking Experimental (gemini-2.0-flash-thinking-exp-1219 explicitly shows its thoughts
Why does no one mention that you must login with a Google account, with all of the record keeping, cross correlations and 3rd party access implied there..
I wonder how common this is, but my interest in this product is 0 simply because my level of trust and feeling of goodwill for Google almost couldn’t be lower.
It’s funny, I’ve never actually used Gemini and, though this may be incorrect, I automatically assume it’s awful. I assume it’s awful because the AI summaries at the top of Google Search are so awful, and that’s made me never give Google AI a chance.
I don't think your take is incorrect. I give it a try from time to time and it's always been inferior to other offerings for me every time I've tested it. Which I find a bit strange as NotebookLM (until recently) had been great to use. Whatever... there are plenty of other good options out there.
This is a lie since I don't have a google account, and cannot search on google anymore since noscript/basic (x)html browsers interop was broken a few weeks ago.
Here's me not using Gemini 1 because the only use case for me for old assistant is setting a timer. Because of reports that Gemini is randomly incapable of setting one.
Pixels replaced Assistent w/ Gemini a while back and it was horrendous; would answer questions but not perform the basic tasks you actually used Assistant for (setting timer, navigating, home control, etc).
Seems like they're approaching parity (finally) months and months later (alarms/tv control work at least now), but losing basic oft-used functionality is a serious fumble.
What is the model I get at gemini.google.com (i.e. through my Workspace subscription)? It says "Gemini Advanced" but there are no other details. No model selection option.
I find the lack of clarity very frustrating. If I want to try Google's "best" model, should I be purchasing something? AI Studio seems focused around building an LLM wrapper app, but I just want something to answer my questions.
Edit: what I've learned through Googling: (1) if you search "is gemini advanced included with workspace" you get an AI overview answer that seems to be incorrect, since they now include Gemini Advanced (?) with every workspace subscription.(2) a page exists telling you to buy the add-on (Gemini for Google Workspace), but clicking on it says this is no longer available because of the above. (3) gemini.google.com says "Gemini Advanced" (no idea which model) at the top, but gemini.google.com/advanced redirects me to what I have deduced is the consumer site (?) which tells me that Gemini Advanced is another $20/month
The problem, Google PMs if you're reading this, is that the gemini.google.com page does not have ANY information about what is going on. What model is this? What are the limits? Do I get access to "Deep Research"? Does this subscription give me something in aistudio? What about code artifacts? The settings option tells me I can change to dark mode (thanks!).
Edit 2: I decided to use aistudio.google.com since it has a dropdown for me on my workspace plan.
changes must be rolling out now, I can see 3 Gemini 2.0 models in the dropdown, with blue "new" badges.
screenshot: https://beeimg.com/images/g25051981724.png
This works on my personal Google account, but not on my workspace one. So I guess there's no access to 2.0 Pro then? I'm ok trying out Flash for now and see if it fixes the mistakes I ran into yesterday.
Edit: it does not. It continues to miss the fact that I'm (incorrectly) passing in a scaled query tensor to scaled_dot_product_attention. o3-mini-high gets this right.
5 replies →
If you subscribe to Gemini the menu looks like this, with the addition of 2.0 Pro.
https://imgur.com/a/xZ7hzag
1 reply →
This is funny how bad UI is on some of websites which are considered the best. Today I tried to find prices for Mistral models but I couldn’t. Their prices page leads to 404…
Just in case you're still interested in their pricing, it's towards the bottom of [1], section "How to buy", when changing the selection from "Self-hosted" to "Mistral Cloud".
[1] https://mistral.ai/en/products/la-plateforme
if only these models were good at web development and could be used in agentic frameworks to build high quality website... wait...
2 replies →
> you get an AI overview answer that seems to be incorrect
It seems AI cannot yet defeat the obfuscation of its own product managers. Great advert for AI, that.
Google workspace is always such a mess. I also have Google workspace, and it did let me do some chatting in the Gemini app few days ago. No idea what model, and of course there was no dropdown.
Just today I wanted to continue a conversation from two days ago, and after writing to the chat, I just get back an error “This chat was created with Gemini Advanced. Get it now to continue this chat.” And I don’t even know if that’s a bug, or some expected sales funnel where they gave me a nibble of it for free and now want me to pay up.
The number one reason I don't use Google Gemini is because they truncate the input text. So I can't simply paste long documents or other kinds of things as raw text in the prompt box.
If you have the need to paste long documents, why don't you just upload the file at that point?
5 replies →
Today I wasted 1 hour looking in how to use or where to find "Deep Research”.
I could not. I have the business workplace standard, which contains the Gemini advance, not sure whether I need a VPN, pay a separate AI product, or even pay a higher workplace tier or what the heck is going on at all.
There are so many confusing products interrelated and lack of focus everywhere that I really do not know anymore whether it is worth as an AI provider.
You need to pay for Gemini to access it. In my experience, it's not worth it. So much potential in the experience, but the AI isn't good enough.
I'm curious about the OpenAI alternative, but am not willing to pay $200/month.
11 replies →
Similar feeling with Gemini g suite integration
Plus one on this it's so stupid, but also mandatory in a way. Sigh
hmm did you try clickin where it says 'gemini advanced'? I find it opens a drop down
I just tried it but nothing happens when I click on that. You're talking about the thing on the upper left next to the open/close menu button?
1 reply →
"what model are you using, exact name please" is usually the first prompt I enter when trying out something.
Gemini 2.0 Flash Thinking responds with
> I am currently running on the Gemini model.
Gemini 1.5 Flash responds with
> I'm using Gemini 2.0 Flash.
I'm not even going to go on a limb here and say that question isn't going to give you an accurate response.
2 replies →
You'd be surprised at how confused some models are about who they are.
1 reply →
> available via the Gemini API in Google AI Studio and Vertex AI.
> Gemini 2.0, 2.0 Pro and 2.0 Pro Experimental, Gemini 2.0 Flash, Gemini 2.0 Flash Lite
3 different ways of accessing the API, more than 5 different but extremely similarly named models. Benchmarks only comparing to their own models.
Can't be more "Googley"!
They actually have two "studios"
Google AI Studio and Google Cloud Vertex AI Studio
And both have their own documentation, different ways of "tuning" the model.
Talk about shipping the org chart.
> Talk about shipping the org chart.
Good expression. I’ve been thinking about a way to say exactly this.
3 replies →
> Talk about shipping the org chart.
To be fair, Microsoft has shipped like five AI portals in the last two years. Maybe four — I don’t even know any more. I’ve lost track of the renames and product (re)launches.
4 replies →
I wonder what changelog of the two studio products tell us about internal org fights(strifes)?
I don't know why you're finding it confusing. There's Duff, Duff Lite and now there's also all-new Duff Dry.
I tend to prefer Duff Original Dry and Lite, but that’s just me
1 reply →
I think this is a good summary: https://storage.googleapis.com/gweb-developer-goog-blog-asse...
- Experimental™
- Preview™
- Coming soon™
2 replies →
Working with google APIs is often an exercise in frustration. I like their base cloud offering the best actually, but their additional APIs can be all over the place. These AI related are the worst.
heuristic for Google Gemini API usage:
if the model name contains '-exp' or '-preview', then API version is 'v1alpha'
otherwise, use 'v1beta'
Honestly naming conventions in the AI world have been appalling regardless of the company
Google isn't even the worst in my opinion. From the top of my head
Anthropic:
Claude 1 Claude Instant 1 Claude 2 Claude Haiku 3 Claude Sonnet 3 Claude Opus 3 Claude Haiku 3.5 Claude Sonnet 3.5 Claude Sonnet 3.5v2
OpenAI:
GPT-3.5 GPT-4 GPT-4o-2024-08-06 GPT-4o GPT-4o-mini o1 o3-mini o1-mini
Fun times when you try to setup throughput provisioning.
3 replies →
Google is the least confusing to me. Old school version number and Pro is better than Flash which is fast and for "simple" stuff (which can be effortless intermediate level coding at this point).
OpenAI is crazy. There may be a day when we might have o5 that is reasoning and 5o that is not, and where they belong to different generations too, snd where "o" meant "Omni" despite o1-o3 not being audiovisual anymore like 4o.
Anthropic crazy too. Sonnets and Haikus, just why... and a 3.5 Sonnet that was released in October that was better than 3.5 Sonnet. (Not a typo) And no one knows why there never was a 3.5 Opus.
5 replies →
Mistral vs mistral.rs, Llama and llama.cpp and ollama, groq and grok. It's all terrible.
2 replies →
Clearly, the next step is to rename one to "Google Chat".
And kill it the 10th time
You missed the first sentence of the release:
>In December, we kicked off the agentic era by releasing an experimental version of Gemini 2.0 Flash
I guess I wasn't building AI agents in February last year.
Yeah some of us have been working on agents predominately for years now, but at least people are finally paying attention. Can't wait to be told how I'm following a hype cycle again.
I tried voice chat. It's very good, except for the politics
We started talking about my plans for the day, and I said I was making chili. G asked if I have a recipe or if I needed one. I said, I started with Obama's recipe many years ago and have worked on it from there.
G gave me a form response that it can't talk politics.
Oh, I'm not talking politics, I'm talking chili.
G then repeated form response and tried to change conversation, and as long as I didn't use the O word, we were allowed to proceed. Phew
I find it horrifying and dystopian that the part where it "Can't talk politics" is just accepted and your complaint is that it interrupts your ability to talk chilli.
"Go back to bed America." "You are free, to do as we tell you"
https://youtu.be/TNPeYflsMdg?t=143
Online the idea of "no politics" is often used as a way to try to stifle / silence discussion too. It's disturbingly fitting to the Gemini example.
I was a part of a nice small forum online. Most posts were everyday life posts / personal. The person who ran it seemed well meaning. Then a "no politics" rule appeared. It was fine for a while. I understood what they meant and even I only want so much outrage in my small forums.
Yet one person posted about how their plans to adopt were in jeopardy over their state's new rules about who could adopt what child. This was a deeply important and personal topic for that individual.
As you can guess the "no politics" rule put a stop to that. The folks who supported laws like were being proposed of course thought that they shouldn't discuss it because it is "politics", others felt that this was that individual talking about their rights and life, it wasn't "just politics". Whole forum fell apart after that debacle.
Gemini's response here is sadly fitting internet discourse... in bad way.
6 replies →
There's nothing wrong with (and in fact much to be said in favor of) a "no politics" rule. When I was growing up it was common advice to not discuss politics/religion in mixed company. At one point I thought that was stupid fuddy-duddy advice, because people are adults and can act reasonably even if they disagree. But as I get older, I realize that I was wrong: people really, really can't control their emotions when politics comes up and it gets ugly. Turns out that the older generation was correct, and you really shouldn't talk politics in mixed company.
Obviously in this specific case the user isn't trying to talk politics, but the rule isn't dystopian in and of itself. It's simply a reflection of human nature, and that someone at Google knows it's going to be a lot of trouble for no gain if the bot starts to get into politics with users.
14 replies →
Hear, hear!
There has to be a better way about it. As I see it, to be productive, AI agents have to be able to talk about politics, because at the end of the day politics are everywhere. So following up on what they do already, they'll have to define a model's political stance (whatever it is), and to have it hold its ground, voicing an opinion or abstaining from voicing an opinion, but continuing the conversation, as a person would (at least as those of us who don't rage-quit a conversation when they hear something slightly controversial).
5 replies →
I agree it's ridiculous that the mention of a politician triggers the block so feels overly tightened (which is the story of existencer for Gemini), but the alternative is that the model will have the politics of it's creators/trainers. Is that preferable to you? (I suppose that depends on how well your politics align with Silicon Valley)
3 replies →
Eh, OP isn't stopped from talking politics, Gemini('s owner, Google) is merely exercising its right to avoid talking about politics with OP. That said, the restriction seems too tight, since merely mentioning Obama ought not count as "politics". From a technical perspective that should be fixed.
OP can go talk politics until he's blue in the face with someone willing to talk politics with them.
One plus is that’s usually less pointless noise this way.
If the model says “sorry, no politics, let’s talk about something else” - there’s a tiny fraction of a minority will make a comment like you did and be done with it. We can all move on.
If the model responds as neutrally as possible, maybe “Obama’s chilli is a great recipe, let me know when you want to begin”, we end up with ENDLESS clutching of pearls, WOKE MODEL SUGGESTS LEFT WING CHILLI BETTER THAN RIGHT WING CHILLI!!! CANCEL LEFTIST GOOGLE!!!
And then the bit that actually bugs me, just to stir up some drama you’ll get the occasional person who absolutely knows better and knows exactly what they’re doing: “I’m Professor MegaQualifications, and actually I will show that those who have criticised the models as leftist are being shut down[1] and ignored but the evidence shows they have a point…”
[1] always said unironically at a time when it’s established as a daily recurring news story being rammed down our throats because it’s one of those easy opinion generators that sells engagement like few other stories outside of mass bloodshed events
[flagged]
"I can't talk politics."
It's a question of right or wrong.
"I can't talk politics."
It's a question of health care.
"I can't talk politics."
It's a question of fact vs fiction, knowledge vs ignorance.
"I can't talk politics."
You are a slave to a master that does not believe in integrity, ethics, community, and social values.
"I can't talk politics."
I find it kind of useless due to the no politics and I usually quickly lose my patience with it. Same with DeepSeek. Meanwhile you can have a decent conversation with Mistral, Claude, pi.ai and other LLMs. Even Chat GPT, although the patronizing appologizing tone is annoying.
Can censorship damage to LLMs be mitigated with LoRA fine-tuning?
This is AI. Someone else decides what topics and what answers are acceptable.
Note, had the same convo with ChatG and it blew right by the O word, and commented that it's nice to have an old recipe to work on over time.
I just got rejected by Gemini then Claude gave me the exact recipe.
I asked why Trump likes numbers 4 and 7. Apparently, this is forbidden topic! Google is insane.
on the other hand, i'd be very weary of anyone eating Trump's chili recipe
McDonald's doesn't serve chili
These names are unbelievably bad. Flash, Flash-Lite? How do these AI companies keep doing this?
Sonnet 3.5 v2
o3-mini-high
Gemini Flash-Lite
It's like a competition to see who can make the goofiest naming conventions.
Regarding model quality, we experiment with Google models constantly at Rev and they are consistently the worst of all the major players. They always benchmark well and consistently fail in real tasks. If this is just a small update to the gemini-exp-1206 model, then I think they will still be in last place.
Haiku/sonnet/opus are easily the best named models imo.
as a person who thought they were arbitrary names when i first discovered them and spent an hour trying to figure out the difference i disagree. it gets even more confusion when you realize that opus, which according to their silly naming scheme is supposed to be the biggest and best model they offer is seemingly abandoned and that title has been given to sonnet which is supposed to be the middle of the road model.
you mean sonnet-3.5 (first edition, second edition)?
1 reply →
I used to agree before one of the "Sonnet" models overtook the best "Opus" model
> It's like a competition to see who can make the goofiest naming conventions.
I'm still waiting for one of them to overflow from version 360 down to One.
Just wait for One X, S, Series X, Series X Pro, Series X Pro with Super Fast Charging 2.0
4 replies →
I completely agree! I'm currently using Gemini's "2.0 Flash Thinking Experimental with apps" model.
Thankfully the names aren’t as bad as how Sony names their products like earphones
https://www.smbc-comics.com/comic/version
Deepseek Reasoner is a pretty good name for a pretty good model I think.. pity the performance is so terrible via the api
What do you use LLMs for at rev? And separate question, how does your diarization compare to deepgram or assembly AI.
Can you come up with better names?
GPT-2, GPT-3, GPT-4.
Google AI. Google AI 2. Google AI 3. - for google
AIPal. AltmanAI. PayAI. - for Altman
Flash Lite is the least bad.
They are probably using their own LLMs to generate the names.
Try out the new models at https://aistudio.google.com.
It's a great way to experiment with all the Gemini models that are also available via the API.
If you haven't yet, try also Live mode at https://aistudio.google.com/live.
You can have a live conversation with Gemini and have the model see the world via your phone camera (or see your desktop via screenshare on the web), and talk about it. It's quite a cool experience! It made me feel the joy of programming and using computers that I had had so many times before.
Except Gemini multimodal outputs are still under lock and key except for a select few.
Very disappointing to see the claim Gemini 2.0 is available for everyone when it's simply not. Seems like Google is following the OpenAI playbook on this.
For anyone that parsing PDF's this is a game changer in term of price per dollar - I wrote a blog about it [1]. I think a lot of people were nervous about pricing since they released the beta, and although it's slightly more expensive than 1.5 Flash, this is still incredibly cost-effective. Looking forward to also benchmarking the lite version.
[1] https://www.sergey.fyi/articles/gemini-flash-2
I upgraded my llm-gemini plugin to handle this, and shared the results of my "Generate an SVG of a pelican riding a bicycle" benchmark here: https://simonwillison.net/2025/Feb/5/gemini-2/
The pricing is interesting: Gemini 2.0 Flash-Lite is 7.5c/million input tokens and 30c/million output tokens - half the price of OpenAI's GPT-4o mini (15c/60c).
Gemini 2.0 Flash isn't much more: 10c/million for text/image input, 70c/million for audio input, 40c/million for output. Again, cheaper than GPT-4o mini.
The only benchmark worth paying attention to.
Is there a way to see/compare the shared results for all of the LLMs you've tested this prompt on in one place? The 2.0 pro result seems decent but I don't have a baseline if that's because it is or if the other 2 are just "extremely bad" or something.
Search by tag: https://simonwillison.net/tags/pelican-riding-a-bicycle/
Not a bad pelican from 2.0 Pro! The singularity is almost upon us :)
The SVGs are starting to look actually recognisable! You'll need a new benchmark soon :)
I've been very impressed by Gemini 2.0 Flash for multimodal tasks, including object detection and localization[1], plus document tasks. But the 15 requests per minute limit was a severe limiter while it was experimental. I'm really excited to be able to actually _do_ things with the model.
In my experience, I'd reach for Gemini 2.0 Flash over 4o in a lot of multimodal/document use cases. Especially given the differences in price ($0.10/million input and $0.40/million output versus $2.50/million input and $10.00/million output).
That being said, Qwen2.5 VL 72B and 7B seem even better at document image tasks and localization.
[1] https://notes.penpusher.app/Misc/Google+Gemini+101+-+Object+...
> In my experience, I'd reach for Gemini 2.0 Flash over 4o
Why not use o1-mini?
Mostly because OpenAI's vision offerings aren't particularly compelling:
- 4o can't really do localization, and ime is worse than Gemini 2.0 and Qwen2.5 at document tasks
- 4o mini isn't cheaper than 4o for images because it uses a lot of tokens per image compared to 4o (~5600/tile vs 170/tile, where each tile is 512x512)
- o1 has support for vision but is wildly expensive and slow
- o3-mini doesn't yet have support for vision, and o1-mini never did
I use all top of the line models everyday. Not for coding, but for general "cognitive" tasks like research, thinking, analysis, writing etc. What Google calls Gemini Pro 2.0 has been my most favorite model for the past couple of months. I think o1/4o come pretty close. Those are kinda equals, with a slight preference for Gemini. Claude has fallen behind, clearly. DeepSeek is intriguing. It excels occassionally where others won't. For consistency's sake, Gemini Pro 2.0 is amazing.
I highly recommend using it via https://aistudio.google.com/. Gemini app has some additional bells and whistles, but for some reason quality isn't always on par with aistudio. Also Gemini app seems to have more filters -- it seems more shy answering controversial topics. Just some general impressions.
do you mean "Gemini 2.0 Pro Experimental 02-05"? too many similarly named models
Yes.
2 replies →
2.0 Pro Experimental seems like the big news here?
> Today, we’re releasing an experimental version of Gemini 2.0 Pro that responds to that feedback. It has the strongest coding performance and ability to handle complex prompts, with better understanding and reasoning of world knowledge, than any model we’ve released so far. It comes with our largest context window at 2 million tokens, which enables it to comprehensively analyze and understand vast amounts of information, as well as the ability to call tools like Google Search and code execution.
It's not that big of a news because they already had gemini-exp-1206 on the API - they just didn't say it was Gemini 2.0 Pro until today. Now the AI Studio marks it as 2.0 Pro Experimental - basically an older snapshot, the newer one is gemini-2.0-pro-exp-02-05.
Oh so the previous model gemini-exp-1206 is now gemini-2.0-pro-experimental on aistudio? Is it better than gemini-2.0-flash-thinking-exp?
1 reply →
Pricing is CRAZY.
Audio input is $0.70 per million tokens on 2.0 Flash, $0.075 for 2.0 Flash-Lite and 1.5 Flash.
For gpt-4o-mini-audio-preview, it's $10 per million tokens of audio input.
The increase is likely because 1.5 Flash was actually cheaper than all other STT services. I wrote about this a while ago at https://ktibow.github.io/blog/geminiaudio/.
I feel that the audio interpreting aspects of the Gemini models aren't just STT. If you give it something like a song, it can give you information about it.
Sadly: "Gemini can only infer responses to English-language speech."
https://ai.google.dev/gemini-api/docs/audio?lang=rest#techni...
I don't know what they mean by this but the obvious interpretation is not true. It understands other languages, it even does really well with low representation languages, in my case Latvian.
Flash is back, baby.
Next release should be called Gemini Macromedia
This is going to send shockwaves through the industry.
.SWF is all we need
Or perhaps it'll help people to weave their dreams together and so it should be called.. ahh I feel old all of a sudden.
Google Frontpage?
2 replies →
You made me feel old ;)
1 reply →
Dreamweaver!
Google Gemini MX 2026
How about Gemini Director for the next agentic stuff.
Gemini Applets
That 1M tokens context window alone is going to kill a lot of RAG use cases. Crazy to see how we went from 4K tokens context windows (2023 ChatGPT-3.5) to 1M in less than 2 years.
We have heard this before when 100k and 200k were first being normalized by Anthropic way back when and I tend to be skeptical in general when it comes to such predictions, but in this case, I have to agree.
Having used the previews for the last few weeks with different tasks and personally designed challenges, what I found is that these models are not only capable of processing larger context windows on paper, but are also far better at actually handling long, dense, complex documents in full. Referencing back to something upon specific request, doing extensive rewrites in full whilst handling previous context, etc. These models also have handled my private needle in haystack-type challenges without issues as of yet, though those have been limited to roughly 200k in fairness. Neither Anthropics, OpenAIs, Deepseeks or previous Google models handled even 75k+ in any comparable manner.
Cost will of course remain a factor and will keep RAG a viable choice for a while, but for the first time I am tempted to agree that someone has delivered a solution which showcases that a larger context window can in many cases work reliably and far more seemlessly.
Is also the first time a Google model actually surprised me (positively), neither Bard, nor AI answers or any previous Gemini model had any appeal to me, even when testing specificially for what other claimed to be strenghts (such as Gemini 1.5s alleged Flutter expertise which got beaten by both OpenAI and Anthropics equivalent at the time).
That's not really my experience. Error rate goes up the more stuff you cram into the context, and processing gets both slower and more expensive with the amount of input tokens.
I'd say it makes sense to do RAG even if your stuff fits into context comfortably.
Try exp-1206. That thing works on large context.
> That 1M tokens context window
2M context window on Gemini 2.0 Pro: https://deepmind.google/technologies/gemini/pro/
> is going to kill a lot of RAG use cases.
I have a high level understanding of LLMs and am a generalist software engineer.
Can you elaborate on how exactly these insanely large (and now cheap) context windows will kill a lot of RAG use cases?
If a model has 4K input context and you have a document or code base with 40K, then you have to split it up. The system prompt, user prompt, and output token budget all eat into this. You might need hundreds of small pieces, which typically end up in a vector database for RAG retrieval.
With a million tokens you can shove several short books into the prompt and just skip all that. That’s an entire small-ish codebase.
A colleague used a HTML dump of every config and config policy from a Windows network, pasted it into Gemini and started asking questions. It’s just that easy now!
Maybe someone knows, what's the usual recommendation regarding big context windows? Is it safe to use it to the max, or performance will degrade and we should adapt the maximum to our use case?
Gemini can in theory handle 10M tokens, I remember they saying it in one of their presentations.
Benchmarks or it didn't happen. Anything better than https://lmarena.ai/?leaderboard?
My experience with the Gemini 1.5 models has been positive. I think Google has caught up.
Some of my saved bookmarks:
- https://aider.chat/docs/leaderboards/
- https://www.prollm.ai/leaderboard
- https://www.vellum.ai/llm-leaderboard
- https://lmarena.ai/?leaderboard
Livebench is better. llmarena is a vibes benchmark
Gemini 2.0 works great with large context. A few hours ago, I posted a ShowHN about parsing an entire book in a single prompt. The goal was to extract characters, relationships, and descriptions that could then be used for image generation:
https://news.ycombinator.com/item?id=42946317
Which Gemini model is notebooklm using atm? Have they switched yet?
Not sure. I am using models/API keys from https://aistudio.google.com. They just added new models, e.g., gemini-2.0-pro-exp-02-05. Exp models are free of charge with some daily quota depending on model.
Updates for Gemini models will always be exciting to me because of how generous free API tier is, I barely run into limits for personal use. Huge context window is a huge advantage for use in personal projects, too
I have a fun query in AI studio where I pasted a 800,000 token Wuxia martial arts novel and ask it worldbuilding questions.
1.5 pro and the old 2.0 flash experimental generated responses in AI studio but the new 2.0 models respond with blank answers.
I wonder if it's timing out or some sort of newer censorship models is preventing 2.0 from answering my query. The novel is pg-13 at most but references to "bronze skinned southern barbarians" "courtesans" "drugs" "demonic sects" and murder could I guess set it off.
I can’t say I ever expected to see this here, but thank you for making my day!
I might try this as a filler remover for the novels I find drag on and on.
Anyone have a take on how the coding performance (quality and speed) of the 2.0 Pro Experimental compares to o3-mini-high?
The 2 million token window sure feels exciting.
I don't know what those "needle in haystack" benchmarks are testing for because in my experience dumping a big amount of code in the context is not working as you'd expect. It works better if you keep the context small
I think the sweet spot is to include some context that is limited to the scope of the problem and benefit from the longer context window to keep longer conversations going. I often go back to an earlier message on that thread and rewrite with understanding from that longer conversation so that I can continue to manage the context window
Claude works well for me loading code up to around 80% of its 200K context and then asking for changes. If the whole project can't fit I try to at least get in headers and then the most relevant files. It doesn't seem to degrade. If you are using something like an AI IDE a lot of times they don't really get the 200K context.
Bad (though I haven't tested autocompletion). It's underperforming other models on livebench.ai.
With Copilot Pro and DeepSeek's website, I ran "find logic bugs" on a 1200 LOC file I actually needed code review for:
- DeepSeek R1 found like 7 real bugs out of 10 suggested with the remaining 3 being acceptable false positives due to missing context
- Claude was about the same with fewer remaining bugs; no hallucinations either
- Meanwhile, Gemini had 100% false positive rate, with many hallucinations and unhelpful answers to the prompt
I understand Gemini 2.0 is not a reasoning model, but DeepClaude remains the most effective LLM combo so far.
I have seen Gemini hallucinate ridiculous bugs in a file that had less than 1000 LOC when I was scratching my head over what was wrong. The issue turned out to be that the cbBLAS matrix multiplication functions expected column major indexing while the code expected row major indexing.
I always get to, “You may not use the Services to develop models that compete with the Services (e.g., Gemini API or Google AI Studio).” [1] and exit
- [1] https://ai.google.dev/gemini-api/terms
I wonder what are the legal implications of breaking the ToS - besides termination of your account?
I just ignore that. If I'm ever large enough to be worth suing, I'll be very happy.
They don't need to sue, they'll just ban your account with no warning or explanation the moment you get on their radar.. at least that's what they did to me.
1 reply →
I’ve tried Gemini many times and never found it to be useful at all compared to OpenAI and Claude.
I am dreaming about an aggregator site where one can select any model - openai, gemini, claude, llama, qwen... and pay API rate + some profit margin for any query. Without having to register with each AI provider, and without sharing my PII with them.
That’s openrouter.ai
Thanks, registering right now.
2 replies →
I am blown away by how bad this is compared to the competition. What is Google doing? I asked it to give me something simple like 6 results, and it gives me 3, not to mention bad data hallucinations that ChatGPT and others are working fine with.
Worth noting that with 2.0 they're now offering free search tool use for 1,500 queries per day.
Their search costs 7x Perplexity Sonar's but imagine a lot of people will start with Google given they can get a pretty decent amount of search for free now.
Is it still the case that it doesn't really support video input?
As in I have a video file I want to send it to the model and get a response about it. Not their 'live stream' or whatever functionality.
I'm interested to know how well video processing works here. Ran into some problems when I was using vertex to serve longer youtube videos.
It sure is cool that people who joined Google's pixel pass continue to be unable to give them money to access Advanced.
I am on the iOS app and I see Gemini 2.0 and Gemini 1.5 as options in the drop down. I am on free tier
Try the Gemini webapp. It has a powerful reasoning model with Google Search and Maps integration.
If you're Google and you're reading, please offer finetuning on multi-part dialogue.
it sucks btw. I tried scheduling an event in google calendar through gemini, and it got the date wrong, the time wrong, and the timezone wrong. it set an event that's supposed to be tomorrow to next year.
To be fair, the best human programmers struggle like hell with date math.
Does it still say that water isn’t frozen at 0 degrees Fahrenheit?
Google should release the weights under a MIT license like Deepseek.
They won't after OpenAI used transformer and refused to give back.
How is that relevant?
I wish the blog mentioned whether they backported DeepSeek ideas into their model to make it more efficient.
Only DeepSeek is allowed to take ideas from everyone else?
where can I try coding with it? Is it available in cursor/copilot?
Exciting news to see these models being released to the Gemini app, I would wish my preferences on which model I want to default to got saved for further sessions.
How many tokens can gemini.google.com handle as input? How large is the context window before it forgets? A quick search said it's 128k token window but that applies to Gemini 1.5 Pro, how is it now then?
My assumption is that "Gemini 2.0 Flash Thinking Experimental is just" "Gemini 2.0 Flash" with reasoning and "Gemini 2.0 Flash Thinking Experimental with apps" is just "Gemini 2.0 Flash Thinking Experimental" with access to the web and Googles other services, right? So sticking to "Gemini 2.0 Flash Thinking Experimental with apps" should be the optimal choice.
Is there any reason why Gemini 1.5 Flash is still an option? Feels like it should be removed as an option unless it does something better than the other.
I have difficulties understanding where each variant of the Gemini model is suited the most. Looking at aistudio.google.com, they have already update the available models.
Is "Gemini 2.0 Flash Thinking Experimental" on gemini.google.com just "Gemini experiment 1206" or was it "Gemini Flash Thinking Experimental" aistudio.google.com?
I have a note on my notes app where I rank every llm based on instructions following and math, to this day, I've had difficulties knowing where to place every Gemini model. I know there is a little popup when you hover over each model that tries to explain what each model does and which tasks it is best suited for, but these explanations have been very vague to me. And I haven't even started on the Gemini Advanced series or whatever I should call it.
The available models on aistudio is now:
- Gemini 2.0 Flash (gemini-2.0-flash)
- Gemini 2.0 Flash Lite Preview (gemini-2.0-flash-lite-preview-02-05)
- Gemini 2.0 Pro Experimental (gemini-2.0-pro-exp-02-05)
- Gemini 2.0 Flash Thinking Experimental (gemini-2.0-flash-thinking-exp-01-21)
If I had to sort these from most likely to fulfill my need to least likely, then it would probably be:
gemini-2.0-flash-thinking-exp-01-21 > gemini-2.0-pro-exp-02-05 > gemini-2.0-flash-lite-preview-02-05 > gemini-2.0-flash
Why? Because aistudio describes gemini-2.0-flash-thinking-exp-01-21 as being able to tackle most complex and difficult tasks while gemini-2.0-pro-exp-02-05 and gemini-2.0-flash-lite-preview-02-05 only differs with how much context they can handle.
So with that out of the way, how does Gemini-2.0-flash-thinking-exp-01-21 compare against o3-mini, Qwen 2.5 Max, Kimi k1.5, DeepSeek R1, DeepSeek V3 and Sonnet 3.5?
My current list of benchmarks I go through is artificialanalysis.ai, lmarena.ai, livebench.ai and aider.chat:s polygot benchmark but still, the whole Gemini suite is difficult to reason and sort out.
I feel like this trend of having many different models with the same name but different suffix starts be an obstacle to my mental model.
Update, I found that AiStudios changelog gives an explanation of where each model is best fit.
https://aistudio.google.com/app/changelog#february-5th-2025
https://aistudio.google.com/app/changelog#december-19-2024
Finally, it's here!
how many "r"'s in strrawberry
stumped it
> Gemini 2.0 is now being forced on everyone.
Not to be confused with Project Gemini.
https://geminiprotocol.net/
is this AGI yet?
[dead]
[dead]
[dead]
[flagged]
When will they release Gemini 2.0 Pro Max?
Is there really no standalone app, like ChatGPT/Claude/DeepSeek, available yet for Gemini?
The standalone app is at https://gemini.google.com/app, and is similar to ChatGPT.
You can also use https://aistudio.google.com to use base models directly.
What do you mean by an app? I have a Gemini app on my iPhone.
Presumably any app that is API agnostic works fine.
I'm not sure why you would want an app for each anyways.
Why does no one mention that you must login with a Google account, with all of the record keeping, cross correlations and 3rd party access implied there..
I wonder how common this is, but my interest in this product is 0 simply because my level of trust and feeling of goodwill for Google almost couldn’t be lower.
Does "everyone" here means "users with google accounts"?
It’s funny, I’ve never actually used Gemini and, though this may be incorrect, I automatically assume it’s awful. I assume it’s awful because the AI summaries at the top of Google Search are so awful, and that’s made me never give Google AI a chance.
I don't think your take is incorrect. I give it a try from time to time and it's always been inferior to other offerings for me every time I've tested it. Which I find a bit strange as NotebookLM (until recently) had been great to use. Whatever... there are plenty of other good options out there.
The huge context window is a big selling point.
This is a lie since I don't have a google account, and cannot search on google anymore since noscript/basic (x)html browsers interop was broken a few weeks ago.
Here's me not using Gemini 1 because the only use case for me for old assistant is setting a timer. Because of reports that Gemini is randomly incapable of setting one.
How does release of LLM API relate to assistant?
Pixels replaced Assistent w/ Gemini a while back and it was horrendous; would answer questions but not perform the basic tasks you actually used Assistant for (setting timer, navigating, home control, etc).
Seems like they're approaching parity (finally) months and months later (alarms/tv control work at least now), but losing basic oft-used functionality is a serious fumble.
4 replies →