← Back to context

Comment by jdw64

17 hours ago

Interpreting these metrics is quite interesting.

One thing for sure is that while Claude is currently taking the #1 spot in mentions, it carries a lot of negative sentiment due to API pricing policies and frequent server downtime. On the other hand, the runner-up, GPT-5.5, actually seems to have more positive feedback.

Personally, my experience with Codex wasn't as good as with Claude Code (Codex freezes on Windows more often than you'd expect), so this is a bit surprising. That said, the more defensive GPT is definitely better in terms of sheer code-writing capability. However, GPT actually has quite a few issues with text corruption when generating in Korean or Chinese—something English-speaking users probably don't notice. In terms of model capabilities, when given the same agent.md (CLAUDE.md) file, I think GPT is better at writing code, while Claude is better at writing text during code reviews.

Looking at the bottom right, Qwen and DeepSeek are open-source, so they are largely mentioned in the context of guarding against vendor lock-in, which drives positive sentiment. Considering that Hacker News occasionally shows negative sentiment toward China, the fact that they are viewed this positively—unlike US models—shows that being open-source is a massive advantage in itself.

Anyway, one thing for sure is that Gemini is pretty much unusable.

I think it's decidedly preliminary to compare models using the same .md file, since they respond quite differently to the same input. I try to narrow to the top 2-3 and then refine inputs for each one. For me it's unfortunately not much better than an intuitive process of trial and error.

Gemini is not at all unusable. It is quite usable for the tasks it excels at - to the point that it is the top pick for many tasks and I spend more money there than elsewhere. On the other hand it responds quite differently from the other major models - so that claude and gpt on one hand are similar and gemini requires a different approach. In my opinion people who think gemini is worthless have not learned how to prompt it correctly. Again, it's intuitive and watching concrete response difference due to small input changes, but if I had to summarize it shows its google books / google scholar roots.

I have started experimenting with qwen more than deepseek, but I have not had good results yet. Given the good press I presume I will learn how to interact with it for better results.

Curious if others have similar experiences in comparing models usefully, or if most don't bother with this, or do something else? I mainly use models for highly focused specialty tasks, so this fine tuning makes the difference between usable and unusable. I don't yet have the luxury of defining my preferred workflow and finding the tool for the task. Everything just breaks almost immediately if I try to shoehorn into my preferred flow.

  • What are your prompting and general tips for using Gemini effectively?

    And what use cases do you think it’s best suited for?

I like your analysis but I think the open models are genuinely well received not only because of vendor lock in or being open source.

They are cheaper! All signals point to them staying cheaper because they are built more sustainably. Also, some of the latest entries can run on 1 GPU! Literally available at your desktop where there can be no service interruptions. Not even network latency. People are one and few shotting little games for 0 dollars because they bought a GPU to play video games this year. To me that's an unbeatable value. Once the tooling catches up and a few more model releases, it could change everything completely.

I know its subjective, but I tried different models with my OpenRouter subscription and VSCode Roocode plugin. I evaluated them based on cost and code quality. I liked gemini-3-flash-preview.

Its really a cost effective model.

I had a surprisingly positive experience with Gemini optimizing some mathy MPS code. It did far better than claude.

Of course, when I tried it on something else it rewrote every line in the file for no good reason, applied changes directly when I told it just to plan, etc.

So maybe it has one strength.

  • Gemini is actually realy good for code review, critique and other tasks. It just cannot be allowed to code himself.

Yeah, I think we are pretty past an idea of "better" and are at the point where it needs qualification as "better at". "Claude writes, Codex reviews, and Gemini doesn't get installed" is my go-to, although I go to Gemini whenever I want an advanced graphical calculator, or data extraction of any type.

  • "Gemini researches" has been my go-to for awhile (although GPT seems to have gotten better recently in this category?).

    Essentially, I use it when I truly only need an "Advanced Google" to find lots of document or website references based on only some partial understanding of "X". I don't like having it do anything with those things. Only when I need to find those things.

    Claude, especially, seems to absolutely hate doing research when there are major ambiguities in your question. It's the only one of the major models that keeps playing 20 questions with me when I neither know nor care what the answers to those questions are.

  • Mostly my experience, but “Gemini crunches data” would be my replacement there.

    If I have a task that requires parsing through swathes of irregular data that traditional ml would choke on (or require an intermediate training step ala bigquery), I have gotten much better results from Gemini than the other two.

> Anyway, one thing for sure is that Gemini is pretty much unusable

Ha! I find that Gemini is quite useful - if only because I am forced to use it (on my personal projects) because it's the only one that has unlimited interaction for "free"

It has its limitations, yes, but so does Claude (which I am leaning on too heavily at work at the moment)