Comment by i_am_proteus
1 day ago
>this error does make me pause to wonder how much of the rest of the paper used AI assistance
And this is what's operative here. The error spotted, the entire class of error spotted, is easily checked/verified by a non-domain expert. These are the errors we can confirm readily, with obvious and unmistakable signature of hallucination.
If these are the only errors, we are not troubled. However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations. They are a signature that some LLM was used to generate parts of the paper and the responsible authors used this LLM without care.
Checking the rest of the paper requires domain expertise, perhaps requires an attempt at reproducing the authors' results. That the rest of the paper is now in doubt, and that this problem is so widespread, threatens the validity of the fundamental activity these papers represent: research.
> If these are the only errors, we are not troubled. However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations. They are a signature that some LLM was used to generate parts of the paper and the responsible authors used this LLM without care.
I am troubled by people using an LLM at all to write academic research papers.
It's a shoddy, irresponsible way to work. And also plagiarism, when you claim authorship of it.
I'd see a failure of the 'author' to catch hallucinations, to be more like a failure to hide evidence of misconduct.
If academic venues are saying that using an LLM to write your papers is OK ("so long as you look it over for hallucinations"?), then those academic venues deserve every bit of operational pain and damaged reputation that will result.
>I am troubled by people using an LLM at all to write academic research papers.
I'm an outsider to the academic system. I have cool projects that I feel push some niche application to SOTA in my tiny little domain, which is publishable based on many of the papers I've read.
If I can build a system that does a thing, I can benchmark and prove it's better than previous papers, my main blocker is getting all my work and information into the "Arxiv PDF" format and tone. Seems like a good use of LLMs to me.
I would argue that an LLM is a perfectly sensible tool for structure-preserving machine translation from another language to English. (Where by "another language", you could also also substitute "very poor/non-fluent English." Though IMHO that's a bit silly, even though it's possible; there's little sense in writing in a language you only half know, when you'd get a less-lossy result from just writing in your native tongue, and then having it translate from that.)
Google Translate et al were never good enough at this task to actually allow people to use the results for anything professional. Previous tools were limited to getting a rough gloss of what words in another language mean.
But LLMs can be used in this way, and are being used in this way; and this is increasingly allowing non-English-fluent academics to publish papers in English-language journals (thus engaging with the English-language academic community), where previously those academics they may have felt "stuck" publishing in what few journals exist for their discipline in their own language.
Would you call the use of LLMs for translation "shoddy" or "irresponsible"? To me, it'd be no more and no less "shoddy" or "irresponsible" than it would be to hire a freelance human translator to translate the paper for you. (In fact, the human translator might be a worse idea, as LLMs are more likely to understand how to translate the specific academic jargon of your discipline than a randomly-selected human translator would be.)
To that point I think it's lovely how LLMs democratize science. At ICLR a few years ago I spoke with a few Korean researchers that were delighted that their relative inability to write in English was no being held against them during the review process. I think until then I underestimated how pivotal this technology was in lowering the barrier to entry for the non-English speaking scientific community.
Autotranslating technical texts is very hard. After the translation, you muct check that all the technical words were translated correctly, instead of a fancy synonym that does not make sense.
(A friend has an old book translated a long time ago (by a human) from Russian to Spanish. Instead of "complex numbers", the book calls them "complicated numbers". :) )
5 replies →
If they can write a whole draft in their first language, they can easily read the translated English version and correct it. The errors described by gp/op were generated when authors directly required LLM to generate a full paragraph of text. Look at my terrible English; I really have the experience of the full process from draft to English version before :)
I'm surprised by these results. I agree that LLMs are a great tool for offsetting the English-speaking world's advantage. I would have expected non-Anglo-American universities to rank at the top of the list. One of the most valuable features of LLMs from the beginning has been their ability to improve written language.
Why is their use more intense in English-speaking universities?
We still do not have a standardized way to represent Machine Learning concepts. For example in vision model, I see lots of papers confused about the "skip connections" and "residual connection" and when they concatenate channels they call them "residual connection" while it shows that they haven't understood why we call them "residual" in the first place. In my humble opinion, each conference, and better be a confederation of conferences, work together to provide a glossary, a technical guideline, and also a special machine translation tool, to correct a non-clear-with-lots-of-grammatical-error-English like mine!
Good point. There may be a place for LLMs for science writing translation (hopefully not adding nor subtracting anything) when you're not fluent in the language of a venue.
You need a way to validate the correctness of the translation, and to be able to stand behind whatever the translation says. And the translation should be disclosed on the paper.
> And also plagiarism, when you claim authorship of it.
I don't actually mind putting Claude as a co-author on my github commits.
But for papers there are usually so many tools involved. It would be crowded to include each of Claude, Gemini, Codex, Mathematica, Grammarly, Translate etc. as co-authors, even though I used all of them for some parts.
Maybe just having a "tools used" section could work?
I suspect the parent post was concerned about plagiarizing the author of training data; not software tools.
There are legitimate, non-cheating ways to use LLMs for writing. I often use the wrong verb forms ("They synthesizes the ..."), write "though" when it should be "although", and forget to comma-separate clauses. LLMs are perfect for that. Generating text from scratch, however, is wrong.
I agree, but I don't think any of the broadly acceptable uses would result in easily identifiable flaws like those in the post, especially hallucinated URLs.
I do similar proofing (esp spelling) but u need to be very careful as it will nudge u to specific styles that rob originality.
> I often ... write "though" when it should be "although"
That is a purely imaginary "error". Anywhere you can use 'although', you are free to use 'though' instead.
2 replies →
> It's a shoddy, irresponsible way to work. And also plagiarism, when you claim authorship of it.
It reminds me of kids these days and their fancy calculators! Those new fangled doohickeys just aren't reliable, and the kids never realize that they won't always have a calculator on them! Everyone should just do it the good old fashioned way with slide rules!
Or these darn kids and their unreliable sources like Wikipedia! Everyone knows that you need a nice solid reliable source that's made out of dead trees and fact checked but up to 3 paid professionals!
I doubt that it's common for anyone to read a research paper and then question whether the researcher's calculator was working reliably.
Sure, maybe someday LLMs will be able to report facts in a mostly reliable fashion (like a typical calculator), but we're definitely not even close to that yet, so until we are the skepticism is very much warranted. Especially when the details really do matter, as in scientific research.
2 replies →
Im really not motivated by this argument; it seems a false equivalence. Its not merely a spell checker or removing some tedium.
As a professional mathematician I used wikipedia all the time to lookup quick facts before verifying it myself or elsewhere. A calculator well; I can use an actual programming language.
Up until this point neither of those tools were asvertised or used by people to entirely replace human input.
5 replies →
> Those new fangled doohickeys just aren't reliable
Except they are (unlike a chatbot, a calculator is perfectly deterministic), and the unreliability of LLMs is one of their most, if not the most, widespread target of criticism.
Low effort doesn't even begin to describe your comment.
4 replies →
Annoying dismissal.
In an academic paper, you condense a lot of thinking and work, into a writeup.
Why would you blow off the writeup part, and impose AI slop upon the reviewers and the research community?
3 replies →
One issue with this analogy is that calculators really are precise when used correctly. LLMs are not.
I do think they can be used in research but not without careful checking. In my own work I’ve found them most useful as search aids and brainstorming sounding boards.
5 replies →
>also plagiarism
To me, this is a reminder of how much of a specific minority this forum is.
Nobody I know in real life, personally or at work, has expressed this belief.
I have literally only ever encountered this anti-AI extremism (extremism in the non-pejorative sense) in places like reddit and here.
Clearly, the authors in NeurIPS don't agree that using an LLM to help write is "plagiarism", and I would trust their opinions far more than some random redditor.
> Nobody I know in real life, personally or at work, has expressed this belief.
TBF, most people in real life don't even know how AI works to any degree, so using that as an argument that parent's opinion is extreme is kind of circular reasoning.
> I have literally only ever encountered this anti-AI extremism (extremism in the non-pejorative sense) in places like reddit and here.
I don't see parent's opinions as anti-AI. It's more an argument about what AI is currently, and what research is supposed to be. AI is existing ideas. Research is supposed to be new ideas. If much of your research paper can be written by AI, I call into question whether or not it represents actual research.
4 replies →
I find that hard to believe. Every creative professional that I know shares this sentiment. That’s several graphic designers at big tech companies, one person in print media, and one visual effects artist in the film industry. And once you include many of their professional colleagues that becomes a decent sample size.
3 replies →
The LLM model and version should be included as an author so there's useful information about where the content came from.
Higher education is not free. People pay a shit ton of money to attend and also governments (taxpayers) invest a lot. Imagine offloading your research to an AI bot...
> AI Overview
> Plagiarism is using someone else's words, ideas, or work as your own without proper credit, a serious breach of ethics leading to academic failure, job loss, or legal issues, and can range from copying text (direct) to paraphrasing without citation (mosaic), often detected by software and best avoided by meticulous citation, quoting, and paraphrasing to show original thought and attribution.
“Anti-AI extremism”? Seriously?
Where does this bizarre impulse to dogmatically defend LLM output come from? I don’t understand it.
If AI is a reliable and quality tool, that will become evident without the need to defend it - it’s got billions (trillions?) of dollars backstopping it. The skeptical pushback is WAY more important right now than the optimistic embrace.
5 replies →
> Clearly, the authors in NeurIPS don't agree that using an LLM to help write is "plagiarism",
Or they didn't consider that it arguably fell within academia's definition of plagiarism.
Or they thought they could get away with it.
Why is someone behaving questionably the authority on whether that's OK?
> Nobody I know in real life, personally or at work, has expressed this belief. I have literally only ever encountered this anti-AI extremism (extremism in the non-pejorative sense) in places like reddit and here.
It's not "anti-AI extremism".
If no one you know has said, "Hey, wait a minute, if I'm copy&pasting this text I didn't write, and putting my name on it, without credit or attribution, isn't that like... no... what am I missing?" then maybe they are focused on other angles.
That doesn't mean that people who consider different angles than your friends do are "extremist".
They're only "extremist" in the way that anyone critical at all of 'crypto' was "extremist", to the bros pumping it. Not coincidentally, there's some overlap in bros between the two.
6 replies →
Yup, and no matter how flimsy an anti-ai article is, it will skyrocket to the top of HN because of it. It makes sense though, HN users are the most likely to feel threatened by LLMs, and therefore are more likely to be anxious about them.
I don’t love ai either, but that’s the truth.
2 replies →
This seems like finding spelling errors and using them to cast the entire paper into doubt.
I am unconvinced that the particular error mentioned above is a hallucination, and even less convinced that it is a sign of some kind of rampant use of AI.
I hope to find better examples later in the comment section.
I actually believe it was an AI hallucination, but I agree with you that it seems the problem is far more concentrated to a few select papers (e.g., one paper made up more than 10% of the detected errors).
What's the big deal with one dead canary? This coal mine's productivity is at record highs!
Why don't you look at the actual article? There are several more egregious examples, e.g., the authors being cited as "John Smith and Jane Doe"
I can see that either way. It could also be a placeholder until the actual author list is inserted. This could happen if you know the title, but not the authors and insert a temporary reference entry.
2 replies →
The problem is, 10 years ago when I was still publishing even I would let an incorrect citation go through b/c of an old bibtex file or some such.
Yeah, errors of omission are so common that "Errors and Omissions" is a category of professional liability insurance.
> However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations
Given how stupidly tedious and error-prone citations are, I have no trouble believing that the citation error could be the only major problem with the paper, and that it's not a sign of low quality by itself. It would be another matter entirely if we were talking about something actually important to the ideas presented in the paper, but it isn't.
agree, I dont find this evidence of AI. It often happened that authors change, there are multiple venues, or I'm using an old version of the paper. We also need to see the denominator. If this google paper had this one bad citation out of 20 versus out of 60.
Also everyone I know has been relying on google scholar for 10+ years. Is that AI-ish? There are definitely errors on there. If you would extrapolate from citation issues to the content in the age of LLMs, were you doing so then as well?
It's the age-old debate about spelling/grammar issues in technical work. In my experience it rarely gets to the point that these errors eg from non-native speakers affect my interpretation. Others claim to infer shoddy content.
Google scholar and the vagaries of copy/paste errors has mangled bibitex ever since it became a thing, a single citation with these sorts of errors may not even be AI, just “normal” mistakes.