Comment by tempestn
15 hours ago
In my experience Gemini 3.0 pro is noticeably better than chatgpt 5.2 for non-coding tasks. The latter gives me blatantly wrong information all the time, the former very rarely.
15 hours ago
In my experience Gemini 3.0 pro is noticeably better than chatgpt 5.2 for non-coding tasks. The latter gives me blatantly wrong information all the time, the former very rarely.
Strange that you say that because the general consensus (and my experience) seems to be the opposite, as well as the AA-Omniscience Hallucination Rate Benchmark which puts 3.0 Pro among the higher hallucinating models. 3.1 seems to be a noticeable improvement though.
Google actually has the BEST ratings in the AA-Omniscience Index: AA-Omniscience Index (higher is better) measures knowledge reliability and hallucination. It rewards correct answers, penalizes hallucinations, and has no penalty for refusing to answer.
Gemini 3.1 is the top spot, followed by 3.0 and then opus 4.6 max
This isn't actually correct.
Gemini 3.0 gets a very high score because it's very often correct, but it does not have a low hallucination rate.
https://artificialanalysis.ai/#aa-omniscience-hallucination-...
It looks like 3.1 is a big improvement in this regard, it hallucinates a lot less.
> the AA-Omniscience Hallucination Rate Benchmark which puts 3.0 Pro among the higher hallucinating models. 3.1 seems to be a noticeable improvement though.
As sibling comment says, AA-Omniscience Hallucination Rate Benchmark puts Gemini 3.0 as the best performing aside from Gemini 3.1 preview.
https://artificialanalysis.ai/evaluations/omniscience
You are misreading the benchmark.
https://artificialanalysis.ai/#aa-omniscience-hallucination-...
If you look at the results 3.0 hallucinates an awful lot, when it's wrong.
It's just not wrong that often.
(And it looks like 3.1 does better on both fronts)
I can only speak to my own experience, but for the past couple of months I've been duplicating prompts across both for high value tasks, and that has been my consistent finding.
I agree and it has been my almost exclusive go to ever since Gemini 3 Pro came out in November.
In my opinion Google isn't as far behind in coding as comments here would suggest. With Fast, it might already have edited 5 files before Claude Sonnet finished processing your prompt.
There is a lot of potential here, and with Antigravity as well as Gemini CLI - I did not test that one - they are working on capitalizing on it.
Google is good for answering questions but its writing is lacking. I’ve had to deal with Gemini slop and it’s worse than ChatGPT