Comment by neilv
18 hours ago
> If these are the only errors, we are not troubled. However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations. They are a signature that some LLM was used to generate parts of the paper and the responsible authors used this LLM without care.
I am troubled by people using an LLM at all to write academic research papers.
It's a shoddy, irresponsible way to work. And also plagiarism, when you claim authorship of it.
I'd see a failure of the 'author' to catch hallucinations, to be more like a failure to hide evidence of misconduct.
If academic venues are saying that using an LLM to write your papers is OK ("so long as you look it over for hallucinations"?), then those academic venues deserve every bit of operational pain and damaged reputation that will result.
I would argue that an LLM is a perfectly sensible tool for structure-preserving machine translation from another language to English. (Where by "another language", you could also also substitute "very poor/non-fluent English." Though IMHO that's a bit silly, even though it's possible; there's little sense in writing in a language you only half know, when you'd get a less-lossy result from just writing in your native tongue, and then having it translate from that.)
Google Translate et al were never good enough at this task to actually allow people to use the results for anything professional. Previous tools were limited to getting a rough gloss of what words in another language mean.
But LLMs can be used in this way, and are being used in this way; and this is increasingly allowing non-English-fluent academics to publish papers in English-language journals (thus engaging with the English-language academic community), where previously those academics they may have felt "stuck" publishing in what few journals exist for their discipline in their own language.
Would you call the use of LLMs for translation "shoddy" or "irresponsible"? To me, it'd be no more and no less "shoddy" or "irresponsible" than it would be to hire a freelance human translator to translate the paper for you. (In fact, the human translator might be a worse idea, as LLMs are more likely to understand how to translate the specific academic jargon of your discipline than a randomly-selected human translator would be.)
To that point I think it's lovely how LLMs democratize science. At ICLR a few years ago I spoke with a few Korean researchers that were delighted that their relative inability to write in English was no being held against them during the review process. I think until then I underestimated how pivotal this technology was in lowering the barrier to entry for the non-English speaking scientific community.
Autotranslating technical texts is very hard. After the translation, you muct check that all the technical words were translated correctly, instead of a fancy synonym that does not make sense.
(A friend has an old book translated a long time ago (by a human) from Russian to Spanish. Instead of "complex numbers", the book calls them "complicated numbers". :) )
The convenient thing in this case (verification of translation of academic papers from the speaker's native language to English) is that the authors of the paper likely already 1. can read English to some degree, and 2. are highly likely to be familiar specifically with the jargon terms of their field in both their own language and in English.
This is because, even in countries with a different primary spoken language, many academic subjects, especially at a graduate level (masters/PhD programs — i.e. when publishing starts to matter), are still taught at universities at least partly in English. The best textbooks are usually written in English (with acceptably-faithful translations of these texts being rarer than you'd think); all the seminal papers one might reference are likely to be in English; etc. For many programs, the ability to read English to some degree is a requirement for attendance.
And yet these same programs are also likely to provide lectures (and TA assistance) in the country's own native language, with the native-language versions of the jargon terms used. And any collaborative work is likely to also occur in the native language. So attendees of such programs end up exposed to both the native-language and English-language terms within their field.
This means that academics in these places often have very little trouble in verifying the fidelity of translation of the jargon in their papers. It's usually all the other stuff in the translation that they aren't sure is correct. But this can be cheaply verified by handing the paper to any fluently-multilingual non-academic and asking them to check the translation, with the instruction to just ignore the jargon terms because they were already verified.
When reading technical material in my native language, I sometimes need to translate it back to English to fully understand it.
I remember one time when I had written a bunch of user facing text for an imaging app and was reviewing our French translation. I don't speak French but I was pretty sure "plane" (as in geometry) shouldn't be translated as "avion". And this was human translated!
1 reply →
idk I think Gemini 2.5 did a great job at almost all research math papers translating from french to english...
If they can write a whole draft in their first language, they can easily read the translated English version and correct it. The errors described by gp/op were generated when authors directly required LLM to generate a full paragraph of text. Look at my terrible English; I really have the experience of the full process from draft to English version before :)
I'm surprised by these results. I agree that LLMs are a great tool for offsetting the English-speaking world's advantage. I would have expected non-Anglo-American universities to rank at the top of the list. One of the most valuable features of LLMs from the beginning has been their ability to improve written language.
Why is their use more intense in English-speaking universities?
We still do not have a standardized way to represent Machine Learning concepts. For example in vision model, I see lots of papers confused about the "skip connections" and "residual connection" and when they concatenate channels they call them "residual connection" while it shows that they haven't understood why we call them "residual" in the first place. In my humble opinion, each conference, and better be a confederation of conferences, work together to provide a glossary, a technical guideline, and also a special machine translation tool, to correct a non-clear-with-lots-of-grammatical-error-English like mine!
Good point. There may be a place for LLMs for science writing translation (hopefully not adding nor subtracting anything) when you're not fluent in the language of a venue.
You need a way to validate the correctness of the translation, and to be able to stand behind whatever the translation says. And the translation should be disclosed on the paper.
> And also plagiarism, when you claim authorship of it.
I don't actually mind putting Claude as a co-author on my github commits.
But for papers there are usually so many tools involved. It would be crowded to include each of Claude, Gemini, Codex, Mathematica, Grammarly, Translate etc. as co-authors, even though I used all of them for some parts.
Maybe just having a "tools used" section could work?
>also plagiarism
To me, this is a reminder of how much of a specific minority this forum is.
Nobody I know in real life, personally or at work, has expressed this belief.
I have literally only ever encountered this anti-AI extremism (extremism in the non-pejorative sense) in places like reddit and here.
Clearly, the authors in NeurIPS don't agree that using an LLM to help write is "plagiarism", and I would trust their opinions far more than some random redditor.
> Nobody I know in real life, personally or at work, has expressed this belief.
TBF, most people in real life don't even know how AI works to any degree, so using that as an argument that parent's opinion is extreme is kind of circular reasoning.
> I have literally only ever encountered this anti-AI extremism (extremism in the non-pejorative sense) in places like reddit and here.
I don't see parent's opinions as anti-AI. It's more an argument about what AI is currently, and what research is supposed to be. AI is existing ideas. Research is supposed to be new ideas. If much of your research paper can be written by AI, I call into question whether or not it represents actual research.
> Research is supposed to be new ideas. If much of your research paper can be written by AI, I call into question whether or not it represents actual research.
One would hope the authors are forming a hypothesis, performing an experiment, gathering and analysing results, and only then passing it to the AI to convert it into a paper.
If I have a theory that, IDK, laser welds in a sine wave pattern are stronger than laser welds in a zigzag pattern - I've still got to design the exact experimental details, obtain all the equipment and consumables, cut a few dozen test coupons, weld them, strength test them, and record all the measurements.
Obviously if I skipped the experimentation and just had an AI fabricate the results table, that's academic misconduct of the clearest form.
"If much of your research paper can be written by AI, I call into question whether or not it represents actual research" And what happens to this statement if next year or later this year the papers that can be autonomously written passes median human paper mark?
> TBF, most people in real life don't even know how AI works to any degree
How about the authors who do research for NeurIPS? Do they know how AI works?
1 reply →
I find that hard to believe. Every creative professional that I know shares this sentiment. That’s several graphic designers at big tech companies, one person in print media, and one visual effects artist in the film industry. And once you include many of their professional colleagues that becomes a decent sample size.
Graphic design is a completely different kettle of fish. Comparing it to academic paper writing is disingenuous.
1 reply →
The LLM model and version should be included as an author so there's useful information about where the content came from.
Higher education is not free. People pay a shit ton of money to attend and also governments (taxpayers) invest a lot. Imagine offloading your research to an AI bot...
> AI Overview
> Plagiarism is using someone else's words, ideas, or work as your own without proper credit, a serious breach of ethics leading to academic failure, job loss, or legal issues, and can range from copying text (direct) to paraphrasing without citation (mosaic), often detected by software and best avoided by meticulous citation, quoting, and paraphrasing to show original thought and attribution.
> Clearly, the authors in NeurIPS don't agree that using an LLM to help write is "plagiarism",
Or they didn't consider that it arguably fell within academia's definition of plagiarism.
Or they thought they could get away with it.
Why is someone behaving questionably the authority on whether that's OK?
> Nobody I know in real life, personally or at work, has expressed this belief. I have literally only ever encountered this anti-AI extremism (extremism in the non-pejorative sense) in places like reddit and here.
It's not "anti-AI extremism".
If no one you know has said, "Hey, wait a minute, if I'm copy&pasting this text I didn't write, and putting my name on it, without credit or attribution, isn't that like... no... what am I missing?" then maybe they are focused on other angles.
That doesn't mean that people who consider different angles than your friends do are "extremist".
They're only "extremist" in the way that anyone critical at all of 'crypto' was "extremist", to the bros pumping it. Not coincidentally, there's some overlap in bros between the two.
> Why is someone behaving questionably the authority on whether that's OK?
Because they are not. Using AI to help writing is something literally every company is pushing for.
4 replies →
Yup, and no matter how flimsy an anti-ai article is, it will skyrocket to the top of HN because of it. It makes sense though, HN users are the most likely to feel threatened by LLMs, and therefore are more likely to be anxious about them.
I don’t love ai either, but that’s the truth.
Strange, I find it quite the opposite, especially ”pro-ai” comments are often top of the list.
“Anti-AI extremism”? Seriously?
Where does this bizarre impulse to dogmatically defend LLM output come from? I don’t understand it.
If AI is a reliable and quality tool, that will become evident without the need to defend it - it’s got billions (trillions?) of dollars backstopping it. The skeptical pushback is WAY more important right now than the optimistic embrace.
The fact that there is absurd AI hype right now doesn't mean that we should let equally absurd bullshit pass on the other side of the spectrum. Having a reasonable and accurate discussion about the benefits, drawbacks, side effects, etc. is WAY more important right now than being flagrantly incorrect in either direction.
Meanwhile this entire comment thread is about what appears to be, as fumi2026 points out in their comment, a predatory marketing play by a startup hoping to capitalize on the exact sort of anti AI sentiment that you seem to think is important... just because there is pro AI sentiment?
Naming and shaming everyday researchers based on the idea that they have let hallucinations slip into their paper all because your own AI model has decided thatit was AI so you can signal boost your product seems pretty shitty and exploitative to me, and is only viable as a product and marketing strategy because of the visceral anti AI sentiment in some places.
4 replies →
There are legitimate, non-cheating ways to use LLMs for writing. I often use the wrong verb forms ("They synthesizes the ..."), write "though" when it should be "although", and forget to comma-separate clauses. LLMs are perfect for that. Generating text from scratch, however, is wrong.
I agree, but I don't think any of the broadly acceptable uses would result in easily identifiable flaws like those in the post, especially hallucinated URLs.
I do similar proofing (esp spelling) but u need to be very careful as it will nudge u to specific styles that rob originality.
> I often ... write "though" when it should be "although"
That is a purely imaginary "error". Anywhere you can use 'although', you are free to use 'though' instead.
Yeah, but you cannot use although anywhere you can use though, though.
1 reply →
> It's a shoddy, irresponsible way to work. And also plagiarism, when you claim authorship of it.
It reminds me of kids these days and their fancy calculators! Those new fangled doohickeys just aren't reliable, and the kids never realize that they won't always have a calculator on them! Everyone should just do it the good old fashioned way with slide rules!
Or these darn kids and their unreliable sources like Wikipedia! Everyone knows that you need a nice solid reliable source that's made out of dead trees and fact checked but up to 3 paid professionals!
I doubt that it's common for anyone to read a research paper and then question whether the researcher's calculator was working reliably.
Sure, maybe someday LLMs will be able to report facts in a mostly reliable fashion (like a typical calculator), but we're definitely not even close to that yet, so until we are the skepticism is very much warranted. Especially when the details really do matter, as in scientific research.
> whether the researcher's calculator was working reliably.
LLM's do not work reliably, that's not their purpose.
If you use them that way it's akin to using a butter knife as a screwdriver. You might get away with it once or twice, but then you slip and stab yourself. Better to go find screwdriver if you need reliable.
> I doubt that it's common for anyone to read a research paper and then question whether the researcher's calculator was working reliably
Reproducibility and repeatability in the sciences?
Replication crisis > Causes > Problems with the publication system in science > Mathematical errors; Causes > Questionable research practices > In AI research, Remedies > [..., open science, reproducible workflows, disclosure, ] https://en.wikipedia.org/wiki/Replication_crisis#Mathematica...
Already verifiable proofs are too impossibly many pages for human review
There are "verify each Premise" and "verify the logical form of the Argument" (P therefore Q) steps that still the model doesn't do for the user.
For your domain, how insufficient is the output given process as a prompt like:
Identify hallucinations from models prior to (date in the future)
Check each sentence of this: ```{...}```
Research ScholarlyArticles (and then their Datasets) which support and which reject your conclusions. Critically review findings and controls.
Suggest code to write to apply data science principles to proving correlative and causative relations given already-collected observations.
Design experiment(s) given the scientific method to statistically prove causative (and also correlative) relations
Identify a meta-analytic workflow (process, tools, schema, and maybe code) for proving what is suggested by this chat
Im really not motivated by this argument; it seems a false equivalence. Its not merely a spell checker or removing some tedium.
As a professional mathematician I used wikipedia all the time to lookup quick facts before verifying it myself or elsewhere. A calculator well; I can use an actual programming language.
Up until this point neither of those tools were asvertised or used by people to entirely replace human input.
There are some interesting possibilities for LLMs in math, especially in terms of generating machine-checked proofs using languages like Lean. But this is a supplement to the actual result, where the LLM would actually be adding a more rigorous version of a human's argument with all the boring steps included.
In a few cases, I see Terrance Tao has pointed out examples LLMs actually finding proofs of open problems unassisted. Not necessarily problems anyone cared deeply about. But there's still the fact that if the proof holds, then it's valid no matter who or what came up with it.
So it's complicated I guess?
I hate to sound like a 19 year old on Reddit but:
AI People: "AI is a completely unprecedented technology where its introduction is unlike the introduction of any other transformative technology in history! We must treat it totally differently!"
Also AI People: "You're worried about nothing, this is just like when people were worried about the internet."
3 replies →
> Those new fangled doohickeys just aren't reliable
Except they are (unlike a chatbot, a calculator is perfectly deterministic), and the unreliability of LLMs is one of their most, if not the most, widespread target of criticism.
Low effort doesn't even begin to describe your comment.
As low effort as you hand waving away any nuance because it doesn’t agree with you?
> Except they are (unlike a chatbot, a calculator is perfectly deterministic)
LLM's are supposed to be stochastic. That is not a bug, I can see why you find that disappointing but it's just the reality of the tool.
However, as I mentioned elsewhere calculators also have bugs and those bugs make their way into scientific research all the time. Floating point errors are particularly common, as are order of operations problems because physical devices get it wrong frequently and are hard to patch. Worse, they are not SUPPOSED TO BE stochastic so when they fail nobody notices until it's far too late. [0 - PDF]
Further, spreadsheets are no better, for example a scan of ~3,600 genomics papers found that about 1 in 5 had gene‑name errors (e.g., SEPT2 → “2‑Sep”) because that's how Excel likes to format things.[1] Again, this is much worse than a stochastic machine doing it's stochastic job... because it's not SUPPOSED to be random, it's just broken and on a truly massive scale.
[0] https://ttu-ir.tdl.org/server/api/core/bitstreams/7fce5b73-1...
[1]https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-al...
2 replies →
Annoying dismissal.
In an academic paper, you condense a lot of thinking and work, into a writeup.
Why would you blow off the writeup part, and impose AI slop upon the reviewers and the research community?
I don't necessarily disagree, but researchers are not required to be good communicators. An academic can lead their field and be a terrible lecturer. A specialist can let a generalist help explain concepts for them.
They should still review the final result though. There is no excuse for not doing that.
2 replies →
One issue with this analogy is that calculators really are precise when used correctly. LLMs are not.
I do think they can be used in research but not without careful checking. In my own work I’ve found them most useful as search aids and brainstorming sounding boards.
> I do think they can be used in research but not without careful checking.
Of course you are right. It is the same with all tools, calculators included, if you use them improperly you get poor results.
In this case they're stochastic, which isn't something people are used to happening with computers yet. You have to understand that and learn how to use them or you will get poor results.
> One issue with this analogy is that calculators really are precise when used correctly. LLMs are not.
I made this a separate comment, because it's wildly off topic, but... they actually aren't. Especially for very large numbers or for high precision. When's the last time you did a firmware update on yours?
It's fairly trivial to find lists of calculator flaws and then identify them in research papers. I recall reading a research paper about it in the 00's.
One issue with this analogy is that paper encyclopedias really are precise when used correctly. Wikipedia is not.
I do think it can be used in research but not without careful checking. In my own work I've found it most useful as a search aid and for brainstorming.
^ this same comment 10 years ago
2 replies →