← Back to context

Comment by nomel

10 hours ago

> The frontier LLMs are getting pretty good at checking this sort of thing.

No, this is career ending high stakes. it requires old school "actually check a record of reality" type methods, like a database query or http get to one of the many services that hold this info.

I think they're saying that frontier LLMs may be usable to spot citations that are correct by shape (a real citation) but incorrect by usage (unrelated to the text)

I kind of hate the idea, but you probably could do a lazy LLM check of every paper and every citation and have it flag possible wrong (second sense) citations for human review

But you'd need a LOT of tokens and a LOT of human-hours

  • > have it flag possible wrong (second sense) citations for human review

    And then what, we're done? How have we avoided the need for the same exhaustive human review? It only saves human review time if you trust the LLM not to miss things.

    • If the goal is to review every citation fully with 100% accuracy, then, sure, exhaustive human review is needed. But I suspect human review of a random sample would add value, catching some fraud, missing others, but having zero false positives (or as close to zero as human review can get).

      An LLM could replace the random sampling. It doesn't need to be particularly good for the approach to provide value. I would worry about LLM bias though.

      Another thing to consider is that readers can detect fake citations after publication, report to arXiv, and the author gets banned.

LLMs can make tool calls to do database and http queries to search for, buy, and cross reference a citation.