Comment by EdNutting
6 days ago
Yeah I tried building such a tool. The problem was two fold:
1) Automated fetching of papers is difficult. API approaches are limited, and often requires per-journal development, scraping approaches are largely blocked, and AI- approaches require web fetch tools which are often blocked and when not, they consume a lot of credits/tokens very quickly.
2) AI generates so many hallucinated citations it’s very hard to know what a given citation was even supposed to be. Sure you can verify one link, but when you start trying to verify and correct 20 to 40 citations, you end up having to deal with hundreds or thousands of citations just to get to a small number of accurate and relevant ones, which rapidly runs you out of credits/tokens on Claude, and API pricing is insane for this use-case. It’s not possible to just verify the link, as “200 Status” isn’t enough to be confident the paper actually exists and actually contains the content the AI was trying to cite. And if it requires human review anyway, then the whole thing is pointless because a human could more quickly search, read and create citations than the AI tool approach (bearing in mind most researchers aren’t starting from scratch - they build up a personal ‘database’ of useful papers relevant to their work, and having an AI search it isn’t optimising any meaningful amount of work; so the focus has to be on discovering new citations).
All in all, AI is a very poor tool for this part of the problem, and the pricing for AI tools and/or APIs is high enough that it’s a barrier to this use case (partly due to tokens, and partly because the web search and web fetch tools are so relatively expensive).
Interesting, tools like Zotero seem to have sorted out the pdf fetching (and metadata + abstract fetching even without institutional access to the pdf). Did you try building the fetching on top of that?
AFAICT Zotero relies on scanning what you browse, not on suggesting citations based on a draft paper. It’s not solving the same problem.
I meant for point 1. Zotero will accept a doi/arxiv link (among other) and download the public metadata (authors, journal, abstract) for you so you don't need to build something for that end. AI cites a paper, copy DOI into Zotero, analyze info Zotero returns.