Comment by hedora

2 hours ago

I think the headline result is the cross product table.

Gemini Pro + Search agreed with Gemini Pro w/o Search 75% of the time, and with everybody else about 50% of the time. No other model had access to search.

So, search is not improving the quality of fact checking 75% of the time (probably a bad system prompt and/or bad fact checking queries), and if asked to flip a coin, then the models do.

0 comments

hedora

No comments yet

Contribute on Hacker News ↗