How Google got its groove back and edged ahead of OpenAI

1 day ago (wsj.com)

I don't get the Gemini 3 hype... yes it's their first usable model, but its not even close to what Opus 4.5 and GPT 5.2 can do.

Maybe on Benchmarks... but I'm forced to use Gemini at work everyday, while I use Opus 4.5 / GPT 5.2 privately every day... and Gemini is just lacking so much wit, creativity and multi-step problem solving skills compared to Opus.

Not to mention that Gemini CLI is a pain to use - after getting used to the smoothness of Claude Code.

Am I alone with this?

  • I cancelled my ChatGPT subscription because of Gemini 3, so obviously I'm having a different experience.

    That said, I use Opus4.5 for coding through Cursor.

    Gemini is for planning / rubber ducking / analysis / search.

    I seriously find it a LOT better for these things.

    ChatGPT has this issue where when it's doesn't know the explanation for something, it often won't hallucinate outright, but create some long-winded confusing word salad that sounds like it could be right but you can't quite tell.

    Gemini mostly doesn't do that and just gives solid scientifically/ technically grounded explanations with sources much of the time.

    That said it's a bit of a double edged sword, since it also tends to make confident statements extrapolating from the sources in ways that aren't entirely supported but tend to be plausible.

    • > ChatGPT has this issue where when it's doesn't know the explanation for something, it often won't hallucinate outright, but create some long-winded confusing word salad that sounds like it could be right but you can't quite tell.

      This is just hallucinating.

    • Fully agree. ChatGPT is often very confident and tells me that X and Y is absolutely wrong in the code. It then answers with something worse... It also does rarely say "sorry, I was wrong" when the previous output was just plain lies. You really need to verify every answer because it is so confident.

      I fully switched to Gemini 3 Pro. Looking into an Opus 4.5 subscription too.

      My GF on the other side prefers ChatGPT for writing tasks quite a lot (school teacher classes 1-4).

    • +1 canceled all OpenAI and switched to Gemini hours after it dropped. I was tired of vape AI, obfuscated facts in hallucinations and promises of future improvements.

      And then there is pricing too…

    • I think it is proving yo be the case that there isn't much stickiness in your chat provider. OpenAI thought memory might bring that but honestly it can be annoying when random things from earlier chats pollute the current one.

    • I also cancelled ChatGPT Plus recently in favour of Gemini. The only thing I don't about the Gemini consumer product is its insistence on giving YouTube links and thumbnails as sources. I've tried to use a rule to prevent it without luck.

      1 reply →

    • Hah, it's funny because I actually cancelled my Gemini subscription to switch full time to ChatGPT about 6 months ago, and now I've done the reverse - Gemini just feels better at the tasks that I'm doing day to day. I think we're just going to see that kind of back and forth for a while while these systems evolve.

  • Full time Antigravity user here, IMO best value coding assistant by far, not even including all the other AI Pro sub perks.

    Still using Claude Pro / GitHub Copilot subs for general terminal/VS Code access to Claude. I consider them all top-tier models, but I prefer the full IDE UX of Antigravity over the VS Code CC sidebar or CC terminal.

    Opus 4.5 is obviously great at all things code, tho a lot of times I prefer Gemini 3 Pro (High) UI's. In the last month I've primarily used it on a Python / Vue project which it excels at, I thought I would've need to switch to Opus at some point if I wasn't happy with a particular implementation, but I haven't yet. Few times it didn't generate the right result was due to prompt misunderstanding which I was able to fix by reprompting.

    I'm still using Claude/GPT 5.2 for docs as IMO they have a more sophisticated command over the English language. But for pure coding assistance, I'm a happy Antigravity user.

    • Antigravity is really amazing yea, by far the best coding assistant IDE, its even superior than Cursor ngl when it comes to very complex tasks, its more methodical in its approach.

      That said I still use Cursor for work and Antigravity sometimes for building toy projects, they are both good.

      5 replies →

    • Looks like codex + antigravity (which gives opus, too) for $40/mo is the sweet busy hobbyist spot… today, anyway. It could change this afternoon.

  • For general researching/chatbot, I don't feel one of them is much better than the other. But since I'm already on Google One plan, upgrading the plan costs less than paying $20/mo to OpenAI, so I ended up cancelling ChatGPT Plus. Plus my Google One is shared with my family so they can also use advanced Gemini models.

    • Yes, same thing, also I find Gemini to be better at search and non-coding tasks - which was my only use case for GPT - coding was always Claude.

  • No, not alone, I find GPT far preferable when it comes to fleshing out ideas. It is much deeper conceptually, it understands intent and can cross pollinate disparate ideas well. Gemini is a little more autistic and gets bogged down in details. The API is useful for high volume extraction jobs, though — Gemini API reliability has improved a lot and has lower failure rate than OpenAI IME.

  • While that may be your personal experience, but for me Gemini always answers my questions better than Claude Opus 4.5 and often better than GPT 5.2. I'm not talking about coding agents, but rather the web based AI systems.

    This has happened enough times now (I run every query on all 3) that I'm fairly confident that Gemini suits me better now. Whereas it used to be consistently dead last and just plain bad not so long ago. Hence the hype.

    • Weird. I find Opus knows the answer more often, plus its explanations are much clearer. Opus puts the main point at the top, while Gemini wanders around for a while before telling you what you need.

  • Dont use it on gemini.google.com, but instead try it on aistudio.google.com.

    Model may be the same but the agent on aistudio makes it much better when it comes to generating code.

    Still jules.google.com is far behind in terms of actual coding agents which you can run in command line.

    Google as always has over engineered their stuff to make it confusing for end users.

    • I tried to sign up for Gemini this weekend but gave up after an hour. I got stuck comparing their offerings, looking for product pages, proper signup, etc. Their product offering and naming is just a mess. Cloud console. AI studio, I was completely lost at some point.

      2 replies →

    • I'm almost positive that using gemini on ai studio is the cause for a lot of strife.

      Most users on it are using it free, and they almost certainly give free users bottom priority/worst compute allocation.

      1 reply →

    • I don't understand...let's say I have it build some code for me, am I supposed to copy all those files out to my file system and then test it out? And then if I make changes to the source, I need to copy the source back in to a studio/(or canvas in Gemini)?

      6 replies →

  • I dunno about Gemini CLI, but I have tried Google Antigravity with Gemini 3 Pro and found it extremely superior at debugging versus the other frontier models. If I threw it at a really, really hard problem, I always expected it to eventually give up, get stuck in loops, delete a bunch of code, fake the results, etc. like every other model and every other version of Gemini always did. Except it did not. It actually would eventually break out of loops and make genuine progress. (And I let it run for long periods of time. Like, hours, on some tricky debugging problems. It used gdb in batch mode to debug crashes, and did some really neat things to try to debug hangs.)

    As for wit, well, not sure how to measure it. I've mainly been messing around with Gemini 3 Pro to see how it can work on Rust codebases, so far. I messed around with some quick'n'dirty web codebases, and I do still think Anthropic has the edge on that. I have no idea where GPT 5.2 excels.

    If you could really compare Opus 4.5 and GPT 5.2 directly on your professional work, are you really sure it would work much better than Gemini 3 Pro? i.e. is your professional work comparable to your private usage? I ask this because I've really found LLMs to be extremely variable and spotty, in ways that I think we struggle to really quantify.

  • This may sound backwards, but gemini 3 flash is quite good when given very specific tasks. It's very fast (much faster than Opus and GPT-5.2), follows instructions very well and spits out working code (in contrast to other flash, haiku etc fast models).

    It does need a solid test suite to keep it in check. But you can move very fast if you have well defined small tasks to give it. I have a PRD then breakdown epics, stories and finally the tasks with Pro first. Works very well.

  • When I had a problem with video handoff between one Linux kernel and the next with a zfsbootmenu system, only Gemini was helpful. ChatGPT led me on a merry chase of random kernel flags that didn't have the right effect.

    What worked was rebuilding the Ubuntu kernel with a disabled flag enabled, but it took too long to get that far.

  • I mean, I'm the exact opposite. Ask ChatGPT to write a simple (but novel) script for AutoHotKey, for example, and it can't do it. Gemini can do it perfectly on the first try.

    ChatGPT has been atrocious for me over the past year, as in its actual performance has deteriorated. Gemini has improved with time. As for the comment about lacking wit, I mean, sure I guess, but I use AI to either help me write code to save me time or to give me information - I expect wit out of actual humans. That shit just annoys me with AI, and neither ChatGPT nor Gemini bots are good at not being obnoxious with metaphors and floral speech.

    • Sounds like you are using ChatGPT to spit out a script in the chat? - if so, you should give 5.2 codex or Claude Code with Opus 4.5 a try... it's night and day.

      11 replies →

  • I've been using both GPT 5.2 and Gemini 3 Pro a lot. I was very impressed with 3 Pro when it came out, and thought I'd cancel my OAI Plus, but I've since found that for important tasks it's been beneficial to compare the results from both, or even bounce between them. They're different enough that it's like collaborating with a team.

  • You’re not alone. I do a small blog reviewing LLMs and have detailed comparisons that go beyond personal anecdotes. Gemini struggles in many usecases.

    Everyone has to find what works for them and the switching cost and evaluation cost are very low.

    I see a lot of comments generally with the same pattern “i cancelled my LEADER subscription and switched to COMPETITOR”… reminiscent of astroturf. However I scanned all the posters in this particular thread and the cancellers do seem like legit HN profiles.

  • I love Gemini. Why would I want my AI agent to be witty? That's the exact opposite of what I am looking for. I just want the correct answer with as little fluff and nonsense as possible.

  • People get used to a model and then work best with that model.

    If you hand an iPhone user an Android phone, they will complain that Android is awful and useless. The same is true vice versa.

    This is in large part why we get so many conflicting reports of model behavior. As you become more and more familiar with a model, especially if it is in fact a good model, other good models will feel janky and broken.

  • > Not to mention that Gemini CLI is a pain to use - after getting used to the smoothness of Claude Code.

    Are you talking strictly about the respective command line tools as opposed to differences in the models they talk to?

    If so, could you list the major pain points of Gemini CLI were Claude Code does better ?

  • Claude Code > Gemini CLI, fair enough

    But I actually find Gemini Pro (not the free one) extremely capable, especially since you can throw any conversation into notebooklm and deep thinking mode to go in depth

    Opus is great, especially for coding and writing, but for actual productivity outside of that (e.g. working with PDF, images, screenshots, design stuff like marketing, tshirts, ...,...) I prefer Gemini. It's also the fastest.

    Nowhere do I feel like GPT 5.2 is as capable as these two, although admittedly I just stopped using it frequently around november.

  • Opus > GPT 5.2 | Gemini 3 Pro to me. But they are pretty close to lately. The gap is smaller now. I'm using it via CLI. For Gemini, their CLI is pretty bad imo. I'm using it via Opencode and pretty happy with it so far. Unfortunately Gemini often throw me rate limit error, and occasionally hang. Their infra is not really reliable, ironically. But other than that, it's been great so far.

  • In my experience, Gemini is great for "one-shot" work, and is my goto for "web" AI usage. Claude Code beats gemini-cli though. Gemini-cli isn't bad, but it's also not good.

    I would love to try antigravity out some more, but last I don't think it is out of playground stage yet, and can't be used for anything remotely serious AFAIK.

  • The Gemini voice app on iOS is unimpressive. They force the answers to be so terse to save cost that it’s almost useless. It quickly goes in circles and needs context pruning. I haven’t tried a paid subscription for Gemini CLI or whatever their new shiny is but codex and Claude code have become so good in the last few months that I’m more focused on using them than exploring options.

  • Nope be it in coding context but Claude and Codex are a combo that really shine and Gemini is pretty useless. The only thing I actually use it for is to triple check the specifications sometimes and thats pretty much it.

  • I haven't straight up cancelled my ChatGPT subscription, but I find that I use Gemini about 95% of the time these days. I never bother with any of Anthropic's stuff, but as far as OpenAI models vs Gemini, they strike me as more or less equivalent.

  • I've found that for any sort of reasonable task, the free models are garbage and the low-tier paid models aren't much better. I'm not talking about coding, just general "help me" usage. It makes me very wary of using these models for anything that I don't fully understand, because I continually get easily falsifiable hallucinations.

    Today, I asked Gemini 3 to find me a power supply with some spec; AC/DC +/- 15V/3A. It did a good job of spec extraction from the PDF datasheets I provided, including looking up how the device performance would degrade using a linear vs switch-mode PSU. But then it comes back with two models from Traco that don't exist, including broken URLs to Mouser. It did suggest running two Meanwell power supplies in series (valid), but 2/3 suggestions were BS. This sort of failure is particularly frustrating because it should be easy and the outputs are also very easy to test against.

    Perhaps this is where you need a second agent to verify and report back, so a human doesn't waste the time?

  • Gemini really only shines when using it in planning life in th vscode fork antigravity. It also supports opus so it's easy to compare.

  • You're not alone, I feel like sometimes I'm on crazy pills. I have benchmarks at work where the top models are plugged into agents, and Gemini 3 is behind Sonnet 4. This aligns closely with my personal usage as well, where Gemini fails to effectively call MCP tools.

    But hey, it's cheapish, and competition is competition

  • Yeah, you are. You're limiting your view to personal use and just the text modality. If you're a builder or running a startup, the price-performance on Gemini 3 Pro and Flash is unmatched, especially when you factor in the quotas needed for scaled use cases. It’s also the only stack that handles text, live voice, and gen-media together. The Workspace/Gmail integration really doesn't represent the raw model's actual power.

    • Depending on Google’s explicit product to build a startup is crazy. There is a risk of them changing APIs or offerings or features without the ability to actually complain, they are not a great B2B company.

      I hope you just use the API and can switch easily to any other provider.

  • I've only used AI pretty sparingly, and I just use it from their websites, but last time I tried all 3 only the code Google generated actually compiled.

    No idea which version of their models I was using.

  • gemini 2.0 flash is and was a godsend for many small tasks and ocr.

    There needs to be a greater distinction between models used for human chat, programming agents, and software-integration - where at least we benefitted from gemini flash models.

  • Claude opus is absurdly amazing. I now spent around $100-200 a day using it. Gemini and all the OpenAI models can’t me up right now.

    Having said that, Google are killing it at the image editing right now. Makes me wonder if that’s because of some library of content and once Anthropocene acquires the same they’ll blow us away there too.

    • API only user or Max x20 along with extra usage? If it's the latter, how are the limits treating you?

    • > I now spent around $100-200 a day using it.

      Really? Are you using many multiple agents a time? I'm on Microsoft's $40/mo plan and even using Opus 4.5 all day (one agent at a time), I'm not reaching the limit.

  • I also get weirdly agitated by this. In my mind Geminy 3 is case of clear benchmaxing and over all massive flop.

    I am currently testing different IDEs including Antigravity, and I avoid that model at all cost. I will rather pay to use different model, than use Geminy 3.

    It sucks at coding compared to OpenAI and Anthropic models and it is not clearly better as chat-bot (I like the context window). The images are best part of it as it is very steerable and fast.

    But WTF? This was supposed to be the OpenAI killer model? Please.

  • Have you used it as a consumer would? Aka in google search results or as a replacement for ChatGPT? Because in my hands it is better than ChatGPT.

  • I've started using Gem 3 while things are still in flux in the AI world. Pleasantly surprised by how good it is.

    Most of my projects are on GPT at the moment, but we're nowhere too far gone that I can't move to others.

    And considering just the general nonsense of Altman vs Musk, I might go to Gemini as a safe harbour (yes, I know how ridiculous that sounds).

    So far, I've also noticed less ass-kissing by the Gemini robot ... a good thing.

  • > Not to mention that Gemini CLI is a pain to use - after getting used to the smoothness of Claude Code.

    Claude Code isn't actually tied to Claude, I've seen people use Claude Code with gpt-oss-120b or Qwen3-30b, why couldn't you use Gemini with Claude Code?

  • I'm with you - the most disappointing was when asking Gemini, technically nano banana, for a PNG with transparent background it just approximated what a transparent PNG would look like in a image viewer, as an opaque background. ChatGPT has no problem. I also appreciate when it can use content like Disney characters. And as far as actual LLMs go, the text is just formatted more readably in GPT to me, with fairly useful application of emojis. I also had an experience asking for tax reporting type of advice, same prompt to both. GPT was the correct response, Gemini suggested cutting corners in a grey way and eventually agreed that GPT's response is safer and better to go with.

    It just feels like OpenAI puts a lot of effort into creating an actually useful product while Gemini just targets benchmarks. Targeting benchmarks to me is meaningless since every model, gpt, Gemini, Claude, constantly hallucinate in real workloads anyways.

While this article is just about optics, I would say the comments here about how the coding agents fare fail to realize we’re just a niche when compared to the actual consumer product that is the Chatbots for the average user.

My mom is not gonna use Claude Code, it doesn’t matter to her. We, on Hacker News, don’t represent the general population.

  • Claude Code purportedly has over a billion dollars in revenue.

    In terms of economic value, coding agents are definitely one of the top-line uses of LLMs.

    • Sure, I don’t disagree, but a fact remains that 1B is less than 10% of OpenAI’s revenue with ChatGPT and its 700M+ user base.

      Coding agents are important, they matter, my comment is that this article isn’t about that, it’s about the other side of the market.

      1 reply →

    • Reminder that the entire AI industry is loaning itself money to boost revenue.

      I seriously question any revenue figures that tech companies are reporting right now. Nobody should be believing anything they say at this time. Fraud is rampant and regulation is non-existent.

      2 replies →

    • Claude has been measurably worse over other models, in my experience. This alone makes me doubt the number. That and Anthropic has not released official public financial statements, so I'll just assume it's the same kind of hand waving heavily leveraged companies tend to do.

      I actually for for ChatGPT and my company pays for Copilot (which is meh).

      Edit: Given other community opinions, I don't feel I'm saying anything controversial. I have noted HN readers tend to be overly bullish on it for some reason.

      2 replies →

  • My mom uses the Google app instead of just going to Google.com on Safari. She’s probably going to use Gemini because she’s locked into that ecosystem. I suspect most people are going to stick with what they use because like you said, to consumers, they can’t really tell the difference between each model. They might get a 5% better answer, but is that worth the switching costs? Probably not.

    That’s why you see people here mention Claude Code or other CLI where Gemini has always fallen short. Because to us, we see more than a 5% difference between these models and switching between these models is easy if you’re using a CLI.

    It’s also why this article is generated hype. If Gemini was really giving the average consumer better answers, people would be switching from ChatGPT to Gemini but that’s not happening.

  • Will your mom pay for chatgpt or just stop when they will try to start converting more users ?

  • Coding agents seem the most likely to become general purpose agents that everyone uses eventually for daily work. They have the most mature and comprehensive capability around tool use, especially on the filesystem, but also in opening browsers, searching the web, running programs (via command line), etc. Their current limitation for widespread usage is UX and security, but at least for the latter, that's being worked on

    I just helped a non-technical friend install one of these coding agents, because its the best way to use an AI model today that can do more than give him answers to questions

  • AI coding has massive factors that should make it the easiest to drive adoption and monetize.

    The biggest is FOMO. So many orgs have a principle-agent problem where execs are buying AI for their whole org, regardless of value. This is easier revenue than nickle-and-diming individuals.

    The second factor is the circular tech economy. Everyone knows everyone, everyone is buying from everyone, it's the same dollar changing hands back and forth.

    Finally, AI coding should be able to produce concrete value. If an AI makes code that compiles and solves a problem it should have some value. By comparison, if your product is _writing_, AI writing is kind of bullshit.

    • > If an AI makes code that compiles and solves a problem it should have some value

      Depends if the cost to weed out the new problems it introduces outweighs the value of the problems solved.

    • I've got to wonder what the potential market size is for AI driven software development.

      I'd have to guess that competition and efficiency gains will reduce the cost of AI coding tools, but for now we've got $100 or $200/mo premium plans for things like Claude Code (although some users may exceed this and pay more), call it $1-2K/yr per developer, and in the US there are apparently about 1M developers, so even with a 100% adoption rate that's only $1-2B revenue spread across all providers for the US market.... a drop in the bucket for a company like Google, and hardly enough to create a sane Price-to-Sales ratio for companies like OpenAI or Anthropic given their sky-high valuations.

      Corporate API usage seems to have potential to be higher (not capped by a fixed size user base), but hard to estimate what that might be.

      ChatBots don't seem to be viable for long-term revenue, at least not from consumers, since it seems we'll always have things like Google "AI Mode" available for free.

  • The other issue with this is that AI is still unprofitable and a money hole.

    If consumers refuse to pay for it, let alone more than $20 for it, coding agent costs could explode. Agent revenue isn’t nearly enough to keep the system running while simultaneously being very demanding.

    • AI development is a money pit, AI use is profitable. Average ChatGPT subscribers are using way less than $20 of electricity and GPU time per month.

      2 replies →

> Naina Raisinghani, 00 needed a name for the new tool to complete the upload. It was 2:30 a.m., though, and nobody was around. So she just made one up, a mashup of two nicknames friends had given her: Nano Banana.

Ah, that explains the silly name for such an impressive tool. I guess it's more a more Googley name than what would have otherwise been chosen: Google Gemini Image Pro Red for Workspace.

  • Strongly disagree.

    Google, OpenAI, and Microsoft all have a very confusing product naming strategy where it’s all lumped under Gemini/ChatGPT/Copilot, and the individual product names are not memorable and really quite obscure. (What does Codex do again?)

    Nano Banana doesn’t tell you what the product does, but you sure remember the name. It really rolls off the tongue, and it looks really catchy on social media.

  • I honestly love the name nano banana, it's stupid as hell, but it's a bit of joy to say specially with how corporate everything is name wised these days.

Google-as-the-new-Microsoft feels about right. Windows 1 was a curiosity, 2 was “ok”, and 3.x is where it started to really win. Same story with IE: early versions were a joke, then it became “good enough” + distribution did the rest.

Gemini 3 feels like Google’s “Windows 3 / IE4 moment”: not necessarily everyone’s favorite yet, but finally solid enough that the default placement starts to matter.

If you are the incumbent you don't need to be all that much better. Just good enough and you win by default. We'll all end up with Gemini 6 (IE 6, Windows XP) and then we'll have something to complain about.

A bit of PR puffery, but it is fair to say that between Gemini and others it’s now been clearly demonstrated that OpenAI doesn’t have any clear moat.

  • Their moat in the consumer world is the branding and the fact open ai has 'memory' which you can't migrate to another provider.

    That means responses can be far more tailored - it knows what your job is, knows where you go with friends, knows that when you ask about 'dates' you mean romantic relationships and which ones are going well or badly not the fruit, etc.

    Eventually when they make it work better, open ai can be your friend and confident, and you wouldn't dump your friend of many years to make another new friend without good reason.

    • I really think this memory thing is overstated on Hacker News. This is not something that is hard to move at all. It's not a moat. I don't think most users even know memory exist outside of a single conversation.

      14 replies →

    • What kind of a moat is that? I think it only works in abusive relationships, not consumer economies. Is OpenAIs model being an abusive money grubbing partner? I suppose it could be!

      3 replies →

    • You can prompt the model to dump all of the memory into a text file and import that.

      In the onboarding flow, I can ask you, "Do you use another LLM?" If so, give it this prompt and then give me the memory file that outputs.

    • > Their moat in the consumer world is the branding and the fact open ai has 'memory' which you can't migrate to another provider.

      Branding isn't a moat when, as far as the mass market is concerned, you are 2 years old.

      Branding is a moat when you're IBM, Microsoft (and more recently) Google, Meta, etc.

    • > Their moat in the consumer world is the branding and the fact open ai has 'memory' which you can't migrate to another provider

      This sounds like first-mover advantage more than a moat.

      5 replies →

    • It's certainly valuable but you can ask Digg and MySpace how secure being the first mover is. I can already hear my dad telling me he is using Google's ChatGPT...

    • > Their moat in the consumer world is the branding and the fact open ai has 'memory' which you can't migrate to another provider.

      Their 'memory' is mostly unhelpful and gets in the way. At best it saves you from prompting some context, but more often than not it adds so much irrelevant context that it over fits responses so hard that it makes them completely useless, specially in exploratory sessions.

    • I just learned Gemini has "memory" because it mixed its response to a new query with a completely unrelated query I had beforehand, despite making separate chats for them. It responded as if they were the same chat. Garbage.

      2 replies →

    • Couldn't you just ask it to write down what it knows about you and copy paste into another provider?

  • The next realization will be that Claude isn't clearly(/any?) better than Google's coding agents.

    • Claude is cranked to the max for coding and specifically agentic coding and even more specifically agentic coding using Claude Code. It's like the macbook of coding LLMs.

    • Claude Code + Opus 4.5 is an order of magnitude better than Gemini CLI + Gemini 3 Pro (at least, last time I tried it).

      I don't know how much secret sauce is in CC vs the underlying model, but I would need a lot of convincing to even bother with Gemini CLI again.

      1 reply →

    • I think Gemini 3.0 the model is smarter than Opus 4.5, but Claude Code still gives better results in practice than Gemini CLI. I assume this is because the model is only half the battle, and the rest is how good your harness and integration tooling are. But that also doesn't seem like a very deep moat, or something Google can't catch up on with focused attention, and I suspect by this time next year, or maybe even six months from now, they'll be about the same.

      1 reply →

Gemini 3 is great, I have moved from gpt and haven't looked back. However, like many great models, I suspect they're expensive to run and eventually Google will nerf the model once it gains enough traction, either by distillation, quantizing, or smaller context windows in order to stop bleeding money.

Here is a report (whether true or not) of it happening:

https://www.reddit.com/r/GeminiAI/comments/1q6ecwy/gemini_30...

  • While I don't use Gemini, I'm betting they'll end up being the cheapest in the future because Google is developing the entire stack, instead of relying on GPUs. I think that puts them in a much better position than other companies like OpenAI.

    https://cloud.google.com/tpu

my guess is the following:

Google can afford to run Gemini for a looong time without any ads, while OpenAI needs necessarily to bring in some revenue: So OpenAI will have to do something (or they believe they can raise money infinitely)

Google can easily give Gemini without Ads to the users for the next 3 - 4 years, forcing OpenAI to cripple their product earlier with Ads because of the need for any revenue

I think Google & Antropic will be one of the two winners; not sure about OpenAI, Perplexity & Co - maybe OpenAI will somehow merge with Microsoft?

  • I’m surprised Perplexity isn’t already dead! Makes me question my ability to evaluate the value/sticking power of these tools.

    (Unless it is dead if we could see DAUs…)

    • My experience is that Perplexity is slightly better at providing facts than ChatGPT (in default mode), probably because (almost) everything comes from a source, not just the model's training set. Although Perplexity does mess up numbers as well.

      My most recent experiment:

      How many Google CEOs there have been?

      Followed by the question

      So 3 CEOs for 27 years. How does that number compare to other companies of this size

      ChatGPT just completely hallucinates the answer -- 5 Microsoft CEOs over 50 years, 3 Amazon CEOs over 30 years, 2 Meta CEOs over 20 years which are just obviously wrong. You don't need to do a search to know these numbers -- they are definitely in the training dataset (barring the small possibility that there has been a CEO change in the past year in any of these companies, which apparently did not happen)

      But Perplexity completely nailed it on first attempt without any additional instructions.

    • I use Perplexity all the time for search. It's very good at exactly that - internet search. So when using it for search related things it really shines

      Yeah sure ChatGPT can spam a bunch of search queries through their search tool but it doesn't really come close to having Perplexity's search graph and index. Their sonar model is also specifically built for search

    • The thing is: Perplexity is quite good for some things; though, it has no traction outside of tech? Most non-techies havent heard about Gemini, what Ive seen/heard. (regardless the fact that they use the Google-Search-AI-overview every day)

  • Microsoft will fund OpenAI for as long as it's needed. What is their alternative?

    • You are right:

      How long will they do it? Id expect investors to roar up some day? Will MS fund them infinitely just for sake of "staying in the game"? According to the public numbers of users, investments, scale etc., OpenAI will need huge amounts of money in the next years, not just "another 10 billion", thats my understanding?

I like using gemini because it's so much cheaper when I'm running tests on enact protocol. I ask it to build multiple tools and let it run.

What CRT standard is this meant to be emulating? It can't be NTSC, it's too clean. Red would never display that cleanly. Red was infamous for bleeding as the saturation increased. Never had much experience with True PAL in that I've only ever seen PAL at 60Hz so I'm not sure if had the same bleeding red issue.

It's these kinds of details that cab really set your yet another emulator apart

I don't think it's really "ahead" but it's pretty close now. There's not that big a difference among the SOTA models, they all have their pros/cons.

  • It’s incredibly impressive to see a large company with over 30x as many employees (or 2x if you compare with GDM) than OAI step back into the AI race compared to where they were with Bard a few years ago.

    Google has proved it doesn’t want to be the next IBM or Microsoft.

    • "the next IBM or Microsoft."

      Actually Microsoft has also shown it doesn't want to be the next IBM. I think at this point Apple is the one where I have trouble seeing a long-term plan.

      3 replies →

    • Why are people so surprised? Attention Is All You Need was authored by Googlers. It’s not like they were blindsided.. OpenAI prouctionized it first but it didn’t make sense to count Google out given their AI history?

      1 reply →

    • You should compare the number of top AI scientists each company has. I think those numbers are comparable (I’m guessing each has a couple of dozen). Also how attractive each company is to the best young researchers.

    • We're talking about code generation here but most people's interactions with LLMs are through text. On that metric Google has led OpenAI for over a year now. Even Grok in "thinking" mode leads OpenAI

      https://lmarena.ai/leaderboard

      Google also leads in image-to-video, text-to-video, search, and vision

  • Google, as a company, is easily ahead even if the model isn't, for various reasons.

    Their real mote is the cost efficiency and the ad business. They can (probably) justify the AI spend and stay solvent longer than the market can stay irrational.

Hi gemini, i’ve booked some tickets for théater. Please look into mail mailbox, schedule it in my calendar and confirm me the planning for next week.

Beeing able to use natural processing my mail and calendar make me switch to gemini (app), there’s no way to achieve that with chatgpt (app)

Gemini is now good enough, even if i prefer chatgpt.

I only care about what i can do in the app as paying customer, even if, aside from that, i am working in IT with the SDK openrouter & MCP intégration & whatever RAG & stuff for work

All of this seems like manufactured hype for Gemini. I use GPT-5.2, Opus 4.5, and Gemini 3 flash and pro with Droid CLI and Gemini is consistently the worst. It gets stuck in loops, wants to wipe projects when it can’t figure out the problem, and still fails to call tools consistently (sometimes the whole thread is corrupted and you can’t rewind and use another model).

Terminal Bench supports my findings, GPT-5.2 and Opus 4.5 are consistently ahead. Only Junie CLI (Jetbrains exclusive) with Gemini 3 Flash scores somewhat close to the others.

It’s also why Ampcode made Gemini the default model and quickly back tracked when all of these issues came to light.

  • Claude for writing the code, Codex for checking the code, Gemini for when you want to look at a pretty terminal UI.

    • Gemini is pretty decent at ingesting and understanding large codebases before providing it to Claude.

  • I'm pretty high on Claude, though not an expert on coding or LLMs at all

    I'm naturally inclined to dislike Google from what they censor, what they consider misinformation, and just, I don't know, some of the projects they run (many good things, but also many dead projects and lying to people)

It's funny how companies have a stable DNA: Google comes from university research and continues to be good at research-y things, OTOH customer service...

Gemini CLI is too slow to be useful, kind of surprised it was even offered and marketed given how painful it is to use. I thought it'd have to be damaging to the Gemini brand to get people to try it out, suffer painful UX then immediately stop using it. (Using it from Australia may also contribute to its slow perf)

Antigravity was also painful to use at launch where more queries failed then succeeded, however they've basically solved that now to the point where it's become my most used editor/IDE where I've yet to hit a quota limit, despite only being on the $20/mo plan - even when using Gemini 3 Pro as the default model. I also can't recall seeing any failed service responses after a month of full-time usage. It's not the fastest model, but very happy with its high quality output.

I expected to upgrade to a Claude Code Max plan after leaving Augment Code, but given how good Antigravity is now for its low cost, I've switched to it as my primary full-time coding assistant.

Still paying for GitHub Copilot / Claude Pro for general VS Code and CC terminal usage, but definitely getting the most value of out my Gemini AI Pro sub.

Note this is only for development, docs and other work product. For API usage in products, I primarily lean on the cheaper OSS chinese models, primarily MiniMax 2.1 for tool calling or GLM 4.7/KimiK2/DeepSeek when extra intelligence is needed (at slower perf). Gemini Flash for analyzing Image, Audio & PDFs.

Also find Nano Banana/Pro (Gemini Flash Image) to consistently generate the highest quality images vs GPT 1.5/SDXL,HiDream,Flux,ZImage,Qwen, which apparently my Pro sub includes up to 1000/day for Nano Banana or 100/day for Pro?? [1], so it's hard to justify using anything else.

If Gemini 3 Pro was a bit faster and Flash a bit cheaper (API Usage), I could easily see myself switching to Gemini for everything. If future releases get smarter, faster whilst remaining aggressively priced, in the future - I expect I will.

[1] https://support.google.com/gemini/answer/16275805?hl=en

  • Your first point

    > kind of surprised it was even offered and marketed given how painful it is to use. I thought it'd have to be damaging to the Gemini brand to get people to try it out, suffer painful UX then immediately stop using it.

    is immediately explain by your second point

    > Antigravity was also painful to use at launch where more queries failed then succeeded, however they've basically solved that now to the point where it's become my most used editor/IDE

    Switching tools is easy right now. Some people pick a tool and stick with it, but it's common to jump from one to the other.

    Many of us have the lowest tier subscriptions from a couple companies at the same time so we can jump between tools all the time.

    • Yeah except Gemini CLI is still bad after such a long time, every now and then I'll fire it up when I need a complex CLI command only to find that it hadn't improved and that I would've been better off asking an LLM instead. I don't quite understand its positioning, it's clearly a product of a lot of dev effort which I thought was for a re-imagined CLI experience, but I can't imagine anyone uses it as a daily driver for that.

      I retried Antigravity a few weeks after launch after Augment Code's new pricing kicked in, and was pleasantly surprised at how reliable it became and how far I got with just the free quota, was happy to upgrade to Pro to keep using it and haven't hit a quota since. I consider it a low tier sub in cost, but enables a high/max tier sub workflow.

  • I've only managed to hit the Opus 4.5 limit once after a really productive 4-hour session. I went for a cup of tea and by time I came back the limit had refreshed.

    I really think people are sleeping on how generous the current limits are. They are going to eat Cursor alive if they keep it this cheap.

    The IDE itself remains a buggy mess, however.

  • Not exactly sure why you are paying for Claude Pro, doesn't GH Copilot Pro give you Claude Opus 4.5 (which I'm assuming you are using since it is SOTA for now). OpenCode lets you use GH Copilot, so you can use OpenCode's ACP adapter and plug it into the IDE

Why wouldn't google do well? They have one of the best data sources, which is a pretty big factor.

Also they have plenty of money, and talented engineers, and tensor chips, etc.

It would have to be significantly better than the competition for me to use a Google product.

  • Everyone’s all about OpenAI v Google, meanwhile i spend 99% of my day with Claude.

    It’s less about it having to be a Google product personally, it just needs to be better, which outside of image editing in Gemini pro 3 image, it is not.

    • I like Claude. I want to use it. But I just never feel comfortable with the usage limits at the ($20/month level at least) and it often feels like those limits are a moving, sometimes obfuscated target. Apparently something irritating happened with those limits over the holidays that convinced a colleague of mine to switch off Claude to Gemini, but I didn't dig for details.

      1 reply →

Is Gemini 3 still having all these bugs in the software around it? The model is great, but I had all these little bugs (billing issues, attachment not accesible by the model, countless other issues).

Then there is the CLI; I always got "model is overloaded" errors even after trying weekly for a while. I found Google has this complex priority system; their bigger customers get priority (how much you spend determines queue prio).

Anybody did some serious work with gemini-cli? Is it at Opus level?

It seems to me like this is yet another instance of just reading vibes, like when GPT 5 was underwhelming and people were like "AI is dead", or people thinking Google was behind last year when 2.5 pro was perfectly fine, or overhyping stuff that makes no sense like Sora.

Wasn't the consensus that 3.0 isn't that great compared to how it benchmarks? I don't even know anymore, I feel I'm going insane.

  • > It seems to me like this is yet another instance of just reading vibes, like when GPT 5 was underwhelming and people were like "AI is dead"

    This might be part of what you meant, but I would point out that the supposed underwhelmingness of GPT-5 was itself vibes. Maybe anyone who was expecting AGI was disappointed, but for me GPT-5 was the model that won me away from Claude for coding.

  • I have a weakly held conviction (because it is based on my personal qualitative opinion) that Google aggressively and quietly quantizes (or reduces compute/thinking on) their models a little while after release.

    Gemini 2.5 Pro 3-25 benchmark was by far my favorite model this year, and I noticed an extreme drop off of quality responses around the beginning of May when they pointed that benchmark to a newer version (I didn't even know they did this until I started searching for why the model degraded so much).

    I noticed a similar effect with Gemini 3.0: it felt fantastic over the first couple weeks of use, and now the responses I get from it are noticeably more mediocre.

    I'm under the impression all of the flagship AI shops do these kinds of quiet changes after a release to save on costs (Anthropic seems like the most honest player in my experience), and Google does it more aggressively than either OpenAI or Anthropic.

    • This is a common trope here the last couple of years. I really can't tell if the models get worse or its in our heads. I don't use a new model until a few months after release and I still have this experience. So they can't be degrading the models uniformly over time, it would have to be a per-user kind of thing. Possible, but then I should see a difference when I switch to my less-used (wife's) google/openAI accounts, which I don't.

    • It's the fate of people relying on cloud services, including the complete removal of old LLM versions.

      If you want stability you go local.

      1 reply →

    • I can definitely confirm this from my experience.

      Gemini 3 feels even worse than GPT-4o right now. I dont understand the hype or why OpenAI would need a red alert because of it?

      Both Opus 4.5 and GPT-5.2 are much more pleasant to use.

Hot take : they didn't, pure players (OpenAI & Anthropics) just didn't go as fast as they claimed they would.

I feel like Gemini made a giant leap forward in its coding capabilities and then in the past week or so it's become shit again - constantly dropping most of the code from my program when I ask it to add a feature - it's gone from incredible to basically useless.

I would add that openai is doing such a poor job at every aspect.

Before this gpt nonsense they were such an aspiration for a better world. They quickly turned around, slayer core people from its structure and solely focus on capitalising that they seem to be stuck on dead waters.

I dont see any reasons to use gpt5 at all.

Gemini is amazing. I switched to it and haven't looked back at ChatGPT. Very fast, very accurate, and pulls on the whole set of knowledge Google has from search.

They didn't get an edge, they are giving gemini pro for free for a year for university emails, obviously people will use it, after a year everyone will drop it, people aren't paying for this.

Google's ham-fisted rollout of Bard as an answer to Chat GPT's was a confounding variable because otherwise there was little reason to doubt Google's ability to compete at AI over the long-term. It's in their DNA.

The best decision for Google happened like 10 years ago when they started manufacturing their own silicon for crunching neural nets. No matter if they had a really good crystal ball back then, smart people, time travel machine or just luck, it pays for them now. They don't need to participate in that Ponzi scheme that OpenAI, Nvidia and Microsoft created, and they don't need to wait in line to buy Nvidia cards.

  • It had to have been launched longer ago than that because their first public-facing, TPU-using generative product was Inbox Smart Reply, which launched more than 10 years ago. Add to that however much time had to pass up to the point where they had the hardware in production. I think the genesis of the project must have been 12-15 years ago.

    • The acquired podcast did a nice episode on the history of AI in Google recently going back all the way to when they were trying to do the "I feel lucky", early versions of translate, etc. All of which laid the ground work for adding AI features to Google and running them at Google scale. That started early in the history of Google when they did everything on CPUs still.

      The transition to using GPU accelerated algorithms at scale started happening pretty early in Google around 2009/2010 when they started doing stuff with voice and images.

      This started with Google just buying a few big GPUs for their R&D and then suddenly appearing as a big customer for NVidia who up to then had no clue that they were going to be an AI company. The internal work on TPUs started around 2013. They deployed the first versions around 2015 and have been iterating on those since then. Interestingly, OpenAI was founded around the same time.

      OpenAI has a moat as well in terms of brand recognition and diversified hardware supplier deals and funding. Nvidia is no longer the only game in town and Intel and AMD are in scope as well. Google's TPUs give them a short term advantage but hardware capabilities are becoming a commodity long term. OpenAI and Google need to demonstrate value to end users, not cost optimizations. This is about where the many billions on AI subscription spending is going to go. Google might be catching up, but OpenAI is the clear leader in terms of paid subscriptions.

      Google has been chasing different products for the last fifteen years in terms of always trying to catch up with the latest and greatest in terms messaging, social networking, and now AI features. They are doing a lot of copycat products; not a lot of original ones. It's not a safe bet that this will go differently for them this time.

      2 replies →

lol somebody got a fat check from google. American journalism has become such a joke

Being a monopoly worth trillions while having enough BUs to subsidize anything you can imagine does have its perks.

  • Also having invented the transformer architecture, doing "AI" since it was called "machine learning", and having data centers and TPUs that define state of the art.

    • Yes and they would have never been able to do any of this without their illegal monopoly that needs to be broken up and sliced/diced to benefit society.

  • Well sure, but lots of big companies have all the resources in the world and can't execute. Google really did turn things around in an impressive way.

  • Additionally, they have built-in distribution and integration for their products. I don’t know how folks don’t see that as a massive advantage.

    It’s like Microsoft and Internet Explorer in the 90s but on a much larger scale both in the breadth (number of distribution channels) and depth (market share of those channels in their respective verticals).

    • That's true. It's also a fine line to walk for Google.

      Google has recently received regulatory pressure, for instance, just like Microsoft had trouble in the late 90s.

  • Didn't work for Meta

    • Or Apple. Also Microsoft who were the ones bankrolling OpenAI before it started floating on its own equity.

      In point of fact money to throw at AI development is essentially free, not one of the big players sustains itself on income. Investors are throwing every dollar they have at everyone with a usable pitch.

      Whatever advantages Google had, financial stability is way, way down the list. For better theories, look to "Proven Leader at Scaling Datacenters" and "Decoupled from the CUDA Cartel via being an early moving on custom AI hardware".

    • The issue with Meta is entirely confined to a single individual, Zuckerberg.

  • I think the single biggest bad thing Google does (Android and YouTube bad practices [2] aside):

    Google taxes every brand and registered trademark.

    The URL bar is no longer a URL bar. It's a search bar. Google used monopoly power to take over 90% of them.

    Now every product, every brand, every trademark competes in a competitive bidding process for their own hard-earned IP and market. This isn't just paying a fee, it's a bidding war with multiple sides.

    If you want to strike Google at their heart, make it illegal to place ads against registered trademarks.

    I'm going to launch "trademark-extortion.org" (or similar) and run a large campaign to reach F500 CEOs and legislators. This needs to end. This is the source that has allowed Google to wreak incredible harm on the entire tech sector. Happy to send this to Sam Altman and Tim Sweeny as well.

    [1] Rug-pulling web installs; leveraging 3rd party vendors to establish market share and treating them like cattle; scare walls and defaults that ensure 99.99% of users wind up with ads, Google payment rails, etc. ; Google ads / chrome / search funnel ; killing Microsoft's smartphone by continually gimping YouTube and Google apps on the platform ; etc. etc. etc. The evilest company.

I think Gemini is still far behind.

  • I did some tests with heavily math oriented programming using ChatGPT and Gemini to rubber-duck (not agentic), going over C performance tuning, checking C code for possible optimizations, going over math oriented code and number theory, and working on optimizing threading, memory throughput, etc. to make the thing go faster, then benchmarking runs of the updated code. Gemini was by far better than ChatGPT in this domain. I was able to test changes by benchmarking. For my use case it was night and day, Gemini's advice generally quite strong and was useful to significantly improve benchmarked performance, ChatGPT was far less useful for this use case. What will work for you will depend on your use case, how well your prompting is tuned to the system you're using, and who knows what other factors, but I have a lot of benchmarks that are clear evidence of the opposite of your experience.

    • Which models? It's completely uninformative to say you compared "ChatGPT" and "Gemini." Those are both just brand names under which several different models are offered, ranging from slow-witted to scary-smart.

  • why?

    • I'm not OP but I tried to ask it today to explain to me how proxies work, and it felt like it couldn't give me an answer that it couldn't attribute to a link. While that sounds good on paper, the problem is that no matter what you ask, you end up with very similar answers because it has less freedom to generate text. While on the other hand, even if attributes everything to a link, there is no guarantee that link actually says what the LLM is telling you it says, so it's just the worse of both worlds.

      ChatGPT on the other hand was able to reformulate the explanation until I understood the part I was struggling with, namely what prevents a proxy from simply passing its own public key as the website's public key. It did so without citing anything.

      2 replies →

lol, the story Disney did not make.

Just like the Disney movie, no touchy the Gemini.

It did? Just 1 minute ago Gemini told me it can't use my google workspace AGAIN. The query had nothing to do with any google workspace feature. It just randomly tells me what in the middle of any "conversation".