← Back to context

Comment by WarmWash

8 hours ago

Whats interesting to note, as someone who uses Gemini, ChatGPT, and Claude, is that Gemini consistently uses drastically fewer tokens than the other two. It seems like gemini is where it is because it has a much smaller thinking budget.

It's hard to reconcile this because Google likely has the most compute and at the lowest cost, so why aren't they gassing the hell out of inference compute like the other two? Maybe all the other services they provide are too heavy? Maybe they are trying to be more training heavy? I don't know, but it's interesting to see.

I've been trying Gemini Pro using their $20-ish Goole One subscription for a couple of months, and I also find it consistently does fewer web searches to verify information than say ChatGPT 5.4 Pro which I have through work.

I was planning on comparing them on coding but I didn't get the Gemini VSCode add-in to work so yeah, no dice.

The Android and web app is also riddled with bugs, including ones that makes you lose your chat history from the threads if you switch between them, not cool.

I'll be cancelling my Google One subscription this month.

  • I don't sweat sources and almost never check them. I usually prefer to manually check information after it's provided, to prevent the model from borking it's context trying to find sources that justify it's already computed output. Almost all the knowledge is already baked into the latent space of the model, so citing sources generally is a backwards process.

    I see it like going to the doctor and asking them to cite sources for everything they tell me. It would be ridiculous and totally make a mess of the visit. I much prefer just taking what the doctor said on the whole, and then verifying it myself afterwards.

    Obviously there is a lot of nuance here, areas with sparse information and certainly things that exist post knowledge cut-off. But if I am researching cell structure, I'm not going to muck up my context making it dig for sources for things that are certainly already optimal in the latent space.

  • You're supposed to download the antigravity VSCode fork and use that and it's rough at best I think. Hey free opus tokens though.

They have to have SOME competitive advantage. What reason is there to use Gemini over Claude or ChatGPT? It's not producing nearly the quality of output.

  • I recently did my taxes using all three models (My return is ~50 pages, much more than a standard 1040).

    GPT (codex) was accurate on the first run and took 12 minutes

    Gemini (antigravity) missed 1 value because it didn't load the full 1099 pdf (the laziness), but corrected it when prompted. However it only spent 2 minutes on the task.

    Claude (CC) made all manner of mistakes after waiting overnight for it to finish because it hit my limit before doing so. However claude did the best on the next step of actually filing out the pdf forms, but it ended up not mattering.

    Ultimately I used gemini in chrome to fill out the forms (freefillableforms.com), but frankly it would have been faster to manually do it copying from the spreadsheets GPT and Gemini output.

    I also use anti-gravity a lot for small greenfield projects(<5k LOC). I don't notice a difference between gemini and claude, outside usage limits. Besides that I mostly use gemini for it's math and engineering capabilities.

    • Yep, I've found Gemini to be the best LLM at most tasks that are not coding. Sometimes Opus wins for engineering, but Gemini holds its own there as well. I also used Gemini to assist me with understanding the details of my (pre-revenue) C-Corp taxes this year. It did a pretty good job walking me through each question I had and raising concern about things I might have overlooked. I validated everything against reliable sources, of course.

      Gemini missed on some nuances about the paperwork processes of Delaware. Gemini repeatedly assumed I could do something instantly via an online portal that actually required either snail-mail or the use of an intermediate who actually had API access to Delaware's systems. In the end, these processes took a couple days, and while I got things done in time, I wish I had not taken questions of process at face value, and instead wish I had kicked off the taxes at the end of February rather than week before they were due.

  • Well comparing Gemini 3.1 Pro vs ChatGPT 5.4 Pro, it's much faster at replying. Of course, if it actually thinks less then that helps a lot towards that. For most of my personal and work use-cases, I prefer waiting a bit longer for a better answer.

They just released their enterprise agentic platform today so my expectation is that might be the gravity well for the Fortune 500's to park their inference on.

I'm 50% convinced that the main lift in GLM-5 over GLM-4.7 was that it was much more willing to use tokens. I had the hardest time getting 4.7 to read enough source code to actually know what it was doing, but once I convinced it to read, it was pretty capable.

Being thrifty can be good! But it also can mean your system is not reflecting sufficiently, is not considering enough factors, isn't reading enough source code.

We are still firmly in "who really knows" territory. I have mixed feelings about token spendiness vs thrift, is all.