Comment by Tiberium

1 month ago

You can easily do this with normal GPT 5.2 in ChatGPT, just turn on thinking (better if extended) and web search, point a Wikipedia page to the model and tell it to check the claims for errors. I've tried it before and surprisingly it finds errors very often, sometimes small, sometimes medium. The less popular the page you linked is, the more likely it'll have errors.

This works because GPT 5.x actually properly use web search.

8 comments

Tiberium

nottorp 1 month ago

Have you verified those errors?

Tiberium 1 month ago

Yes, I did.

mort96 1 month ago

Can you describe some of the errors you have found this way?

Tiberium 1 month ago

Usually they are smaller details in the pages, not the core claims, but that doesn't really refute my point of GPT being easily able to point them. Here are some examples, I'm not including all errors per page that GPT 5.1 found back then (this reply is already too long), but just a few examples.
https://en.wikipedia.org/wiki/Large_Hadron_Collider
> (infobox) Maximum luminosity 1×10^34/(cm2⋅s)
This is from the original design, LHC has been upgraded several times, e.g. if you check https://home.web.cern.ch/news/news/accelerators/lhc-report-r..., you see "Thanks to these improvements, the instantaneous luminosity record was smashed, reaching 2.06 x 10^34 cm^(-2) s^(-1), twice the nominal value." and that was in 2017.
> The first collisions were achieved in 2010 at an energy of 3.5 tera-electronvolts (TeV) per beam
This is wrong, if you check https://home.web.cern.ch/resources/faqs/facts-and-figures-ab... it says "23 November 2009: LHC first collisions (see press release)" - https://home.web.cern.ch/news/press-release/cern/two-circula... and the energy was 450 GeV
Another random example, I was reading https://en.wikipedia.org/wiki/Camponotus_japonicus (a very small article) and decided to ask GPT about it. It checked a lot of other sources and found out that no other source claims that this species of ant inhabits Iran.
Another one: https://en.wikipedia.org/wiki/Java_(software_platform)
> and—until its discontinuation in JDK 9—a browser plug-in
In reality it was deprecated in JDK 9 and removed in JDK 11 - most people would think "discontinuation" means that it was already removed in JDK 9
https://en.wikipedia.org/wiki/Nekopara
> The Opening theme for After, "Contrail" was composed by "Motokyio" and Sung by "Ceul".
Just two misspellings, it should be Motokiyo and Ceui
> A manga adaptation illustrated by Tam-U is currently being published
This section hasn't been updated, but the manga has already finished a long time ago.
===
Here's a direct part of GPT 5.1's response (I tried this back in November, so there was no GPT 5.2 yet) regarding luminosity, and it did also have a citation in the 2nd paragraph to the exact link I used above for the luminosity claim.
– The infobox lists “Maximum luminosity 1×10^34/(cm²·s)” without qualification.
– That number is the original design (nominal) peak luminosity for the LHC, but the machine has substantially exceeded it in routine operation: CERN operations reports show peak instantaneous luminosities of about 1.6×10^34 cm⁻²·s⁻¹ in 2016 and ≈2.0–2.1×10^34 cm⁻²·s⁻¹ in 2017–2018, roughly a factor of two above the nominal design.
– Since the same infobox uses the current maximum beam energy (6.8 TeV per beam) rather than the 7 TeV design value, presenting 1×10^34 cm⁻²·s⁻¹ as “Maximum luminosity” is misleading/outdated if read as the machine’s achieved maximum. It should either be labelled explicitly as “design luminosity” (with a note that higher values have been reached) or the numerical value should be updated to reflect the achieved peak.

sgc 1 month ago

I am sure that could be useful with proper post-request research.

As a technique though, never ask an LLM to find errors. Ask it to either find errors or verify that there are no errors. That way it can answer without hallucinating more easily.

1718627440 1 month ago

> As a technique though, never ask an LLM to find errors.
What I do is both ask it to explain why there are no errors at all and why there tons of errors. Then I use my natural intelligence to reason about the different claims.

multjoy 1 month ago

It says it finds errors.

Tiberium 1 month ago

It gives references that you can then verify manually. I wasn't advocating for a 100% automated process.