Comment by dataflow

15 hours ago

> This is incredibly good for science.

I disagree. It's just one darn hallucinated citation for heaven's sake, not fraud or something. It doesn't account for the substance or quality of their work at all. A one-year ban seems plenty sufficient for a minor first time mistake like this. People make mistakes and a good fraction of them can learn from those mistakes. There's no need to permanently cripple someone's ability to progress their life or contribute to humanity just because an AI hallucinated a reference one time in their life. That's punitive instead of rehabilitative.

101 comments

dataflow

toast0 15 hours ago

> It's just one darn hallucinated citation for heaven's sake, not fraud or something.

It is fraud.

> It doesn't account for the substance or quality of their work at all.

References are part of the work. If you're making up the references, what else are you making up?

> People make mistakes and a good fraction of them can learn from those mistakes. There's no need to permanently cripple someone's ability to progress their life or contribute to humanity just because an AI hallucinated a reference one time in their life.

A one year ban is not permanent. Having a negative consequence for making poor decisions seems like an inducement to learn from the mistake?

In an ideal world, one would be keeping notes on references used while doing the research that lead to writing the paper. Choosing not to do that is one poor decision.

Having a positive outlook, if asking an AI to provide references that may have been missed, one should at least verify the references exist and are relevant. Choosing not to do that is also a poor decision, even if one did take notes on references used while researching.

godelski 11 hours ago
> In an ideal world, one would be keeping notes on references used
In a far less than ideal world authors are referencing papers they've at least read the title and abstract of. In an ideal world, authors would be only referencing works they have read in their entirety. I don't think we need to live in the ideal world[0], but let's also not pretend the ideal world is even remotely out of reach. Let's also be honest that in the current setting a lot of citations are being used to encourage a work be accepted more than they are being used because of their utility to the paper. The average ML paper now is 8 pages and has >50 citations. That's crazy
[0] References can be entire textbooks, which is potentially too high of a bar
- withinboredom 7 hours ago
  
  Even as a human, you can still fuck up references.
  I submitted a paper with a reference author as Elisio because I couldn’t read my own handwriting. After submitting, I double checked all the references through an LLM. It pointed out that their name was actually Enrique. Yes, you should probably double check your references before submitting, not after.
  Point is, I didn’t even trust the LLM at first. But after verifying the mistake, I was embarrassed af. I resubmitted with the fixes before it went live, but ultimately, what’s the difference between “mistake” and “hallucination”?
  
  2 replies →
rossjudson 10 hours ago

If you write your own paper (mostly) and choose your own references (because you've actually read the papers) you won't have a problem.
ksd482 13 hours ago
> It is fraud.
I think we are talking semantics here.
While fraud does require intention to deceive, I get the sentiment that hallucinated citations shouldn't be dismissed as simply carelessness. It should be something stronger than that: gross negligence or something MUCH stronger! There should absolutely be repercussions for this.
But let's not call it fraud. That word is reserved for something specific.
EDIT: someone else said "reckless disregard" equals intent or something to that effect. So I looked it up.
It appears so that is the case. "Reckless Disregard Equals Intent" in legal language.
But I am not sure if this particular clause should apply here. Perhaps it depends on what kind of research is being published? For e.g., if it is related to medical science and has a real consequence on people's health, we can then apply this?
- eqvinox 11 hours ago
  
  I do believe this policy is appropriate to deal with the reckless disregard of posting hallucinated references.
  It's a conscious decision to not take the time to check your AI output, and instead waste a whole bunch of other people's time letting them essentially do that for you in duplicate.
  Feels like that should disqualify you from participation for a bit. Intent or no intent.
  
  6 replies →
- gpm 12 hours ago
  
  The intent to deceive is there. The deception is lying when you submit it that it is a scholarly piece of work in which amongst many other things you know the citations are accurate. This false representation was knowingly and intentionally made at the time of submission.
  The citation being incorrect is merely the proof of deception not the (relevant) deception itself.
  Fraud is the correct description provided (and this is practically a guarantee) you intended to benefit from the submission of the paper (e.g. by bolstering your resume).
  
  5 replies →
- fc417fc802 11 hours ago
  
  I think (though might well be misunderstanding) that reckless disregard is taken to be an intentional choice but that it does not imply that the outcome itself was intentional. The difference between intentionally doing something that you know for a fact has a high risk of failure but you can't necessarily predict the outcome versus intentionally seeking a particular legally disallowed outcome.
  But what LPisGood was saying is that reckless disregard (as opposed to explicit intent) is sufficient to meet the legal bar for fraud.
jruohonen 10 hours ago

> In an ideal world, one would be keeping notes on references used while doing the research that lead to writing the paper. Choosing not to do that is one poor decision.
In this book
https://news.ycombinator.com/item?id=44022957
there is this passage on p. 127:
"Any author citing another paper should be required to provide proof that they a) possess a copy of that paper, b) have read that paper, c) have read the paper carefully."
dataflow 15 hours ago
> It is fraud.
No, it is emphatically not. Fraud requires intent to deceive.
> A one year ban is not permanent.
...what text are you reading? Nobody was calling the one-year ban permanent, or even against it. I was literally in favor of it in my comment. I explicitly said it is already plenty sufficient. What I said is there's no need to go beyond that. My entire gripe was that they very much are going beyond that with a permanent penalty. Did you completely miss where they said "...followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue"?
- LPisGood 14 hours ago
  
  Fraud requires intent to deceive _or_ reckless disregard, sometimes called, “conscious indifference” for the veracity of the statement asserted.
  
  19 replies →
- zeusdclxvi 14 hours ago
  
  If you are using AI-hallucinated references in scientific papers then there is some obvious intent to deceive there
- NiloCK 14 hours ago
  
  > No, it is emphatically not. D Fraud requires intent to deceive.
  I'm about as pro AI-as-a-research--and-writing-assistant and anti AI-witchhunt as they come, but I simply cannot parse what I've quoted here.
  Posting slop to arxiv is blatant deception. Posting an article is an attestation that the article is a genuine engagement with the literature. If you're posting things to arxiv that are not sincere engagements with the literature, you are attempting to deceive.
  
  4 replies →
- toast0 14 hours ago
  
  > followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue"?
  This part seemed reasonable too. I'm not in academia, but my understanding is most people writing papers intend for them to be accepted by reputable peer-reviewed venues, but post to arXiv because those venues don't always allow for simple distribution.
  If your papers aren't going to be accepted at reputable venues and you posted slop to arXiv before (and they noticed it!), seems reasonable that they only want reputable stuff from you in the future?
blazespin 15 hours ago
it's very silly, but not a big deal. Arxiv is becoming irrelevant these days anyways.
In fact would be better if they just banned AI, so we could just get off the luddite platforms.
Automated research is the future, end of story. And really it couldn't have come out at a better time, given the increasingly diminishing returns on human powered research.
- andrepd 15 hours ago
  
  Poe's law striking hard.
- AnimalMuppet 13 hours ago
  
  If automated research is the future, it has to be research, not making stuff up.
  Which of those two does "hallucinated references" fit into?

wrs 15 hours ago

A "mistake" would be a typo in a real citation. A hallucinated citation is evidence of just plain laziness and negligence, which taints the entire submission.

dataflow 15 hours ago
No it is not. Seriously. All you need for this to happen is for your lab partner to ask AI to add a missing citation that they are already familiar with at the last minute before a midnight submission deadline, and for the AI to hallucinate something else, and for them to honestly miss this. It does not even imply any involvement on your part, let alone that either of you were lazy or negligent on the actual research or substance of the paper. The lack of any sympathy or imagination here is astounding.
- asdff 15 hours ago
  
  There are no deadlines for journal submissions. Even if you felt you were running close to your revisions being due, an email to an editor will probably fix this for you. And what you described is still negligent, not verifying the garbage output bot did not in fact output garbage.
  
  2 replies →
- applfanboysbgon 14 hours ago
  
  Your constructed hypothetical makes it even worse. If there are 2+ people in this scenario who have good intentions, this should especially never happen. When you sign your name on a paper, you are nonetheless vouching for everything written in it, including the things you didn't personally write. You should absolutely be checking every single reference your co-author included and verifying that it says what your co-author claims it says. This is something you should have been doing completely independent of LLMs existing. This is something you're publishing publicly, something that may be associated with you and your career for the rest of your life, it is insanely negligent to not even read and verify what your co-author is adding.
- tardedmeme 11 hours ago
  
  In other words: all it needs for your paper to have fraud is for your lab partner to add fraud to your paper.
  I'm not seeing the problem here. The only problem is that your lab partner should be banned and not you. But being incentivised to check your co-author's work before submission isn't a bad thing.
  
  1 reply →
- wrs 13 hours ago
  
  You’re confusing the issue here by saying it’s not your fault, it’s your lab partner’s. We’re talking about why your lab partner did something wrong. You can assign blame for the wrong thing separately.
  The citation is part of the substance of the paper. If you YOLOed in a citation without checking it, seems justified to suspect that you may have YOLOed in some data, or some analysis, or maybe even the conclusion.
  
  1 reply →
- ethin 9 hours ago
  
  This is like saying lawyers should be allowed to submit AI-hallucinated case references or quotations in court documents. Because by your logic, that, too, should be perfectly acceptable. Yet is not, for hopefully obvious reasons. Why exactly should scientific research be any different? If your paper contains hallucinated references, we can't verify your assertions in the paper, and therefore must question the paper as a whole.
- etchalon 11 hours ago
  
  You seem incredibly upset you can't get away with fraud and that people are calling it fraud.
  
  2 replies →
- bigstrat2003 15 hours ago
  
  The lack of understanding that you are responsible for the content you create, no matter what tools you use, is what's astounding.

goolz 15 hours ago

If you cannot be bothered to check your references when writing academic quality papers then you have no place writing them in the first place. The punishment is not chopping off a finger, it is a polite reminder to do the bare minimum.

slashdave 14 hours ago

Well, in the good old days, when we have refereed journals, it would be part of the publishing process.

vhantz 14 hours ago

What's the difference between a "hallucinated" citation and consciously inserting reference to a non-existent paper and hopping it goes unnoticed? How do we determine which one was done consciously and which was "a minor first time mistake"?

Your standards are lower than what they would accept at my high-school. Seriously.

And generally, if you are generating papers with LLMs, let other LLMs read them. Why would we waste human hours considering something that was generated? At this point publish your prompt because that's the actual work you're doing.

ajkjk 15 hours ago

It's not the kind of mistake that is possible unless you're engaging in fraud anyway.

dataflow 15 hours ago
> It's not the kind of mistake that is possible unless you're engaging in fraud anyway.
Seriously? You can't fathom an honest researcher asking for AI to find a citation they know exists, and the AI inserting or modifying a citation incorrectly without them realizing?
If you find evidence of fraud by all means lay down the hammer. Using a single hallucinated citation like it's some kind of ironclad proxy just because you think they must be committing fraud is insane.
- mixtureoftakes 15 hours ago
  
  if you're not checking citations in the paper youre publishing AND trusting a non SOTA, hallucination prone ai model to come up with sources for it, its probably for the best of everyone that this paper isn't published.
  yes there will be rare exceptions but in general i feel like this is a really good addition.
  
  1 reply →
- asdff 15 hours ago
  
  Why would you ask the ai to find a citation you know exists? Just reach for that citation.
- toofy 13 hours ago
  
  if an llm does the work, you did not write it or research it, the llm did. you have no business crediting yourself as an author.
  if someone writes a paper and an entirely different person takes credit for it without even bothering to check if the actual writer just made shit up, they deserve a lifetime ban. seems like a year is a very light punishment.
- ajkjk 14 hours ago
  
  Yes, having AI write something and not checking it yourself is sure to lead to hallucination, hence, it is a fraudulent way to write.
- protocolture 14 hours ago
  
  >Seriously? You can't fathom an honest researcher asking for AI to find a citation they know exists
  Assumptions:
  1. The entire document is loaded into an AI editor
  2. The researcher is asking an AI editor to work on his references
  3. The researcher has not checked his own references.
  This could be avoided at 1, 2 or 3. But even just 1 implies that the researcher knows that they have a hot potato and might critically fuck up and lose all credibility. Being in that scenario and committing to 2 and 3 is at least extreme negligence.
- andrepd 15 hours ago
  
  If you are citing a work you paste a citation to that work. If you are bullshitting you ask an AI to come up with a citation. Jesus, there is zero reason to ever "generate a citation" if you are not, in fact, commiting fraud.
  
  4 replies →
- BonoboIO 14 hours ago
  
  Verifying that the reference you cite actually exists is the absolute minimum standard for academic work. It is not optional, not something to skip because of a deadline, and not something to outsource blindly to hallucination-prone AI.
  If someone cannot meet that bar, they have no business publishing research papers. I have written academic papers myself, and I find it astonishing that people are trying to justify this as if it were some understandable workflow mistake. At that point it is simply slop with academic formatting. Post it on a blog or somewhere else, but do not put it into the scientific record.
  A one-year ban is not a lifetime ban. Maybe six months would also have been enough, but the author can use that time to think about whether they should verify references next time — and to manually check every other citation.
- BonoboIO 14 hours ago
  
  I would not necessarily go as far as calling it fraud, but if you cannot even verify that the reference you are citing actually exists, you are not ready to publish research papers.
  Deadlines are not an excuse here. Checking whether a cited book, paper, or passage exists is the absolute minimum standard for scientific work, not an optional extra. I have written academic papers myself, and I find it astonishing that people are trying to justify this as if it were some understandable workflow mistake. At that point it is simply slop with academic formatting.
  A one-year ban is not a lifetime ban. Maybe six months would also have been enough, but the point is that the author gets time to think about whether they should verify references next time. They can also use that time to manually check every other citation.
- brazzy 4 hours ago
  
  > Seriously? You can't fathom an honest researcher asking for AI to find a citation they know exists, and the AI inserting or modifying a citation incorrectly without them realizing?
  Indeed I cannot. If you do that, you are not, in fact, an honest researcher. You're a lazy hack.

patcon 15 hours ago

A citation is where you derived knowledge... If you haven't checked it and you are submitting something that should represent a ton of labour (and which will consume labour to review), you don't understand what you're doing. It is not just crossing T's and dotting I'd.

Your being set behind is less important than the fact that your publishing is setting everyone else behind.

Such a banned person is being helped to "step out of the way", and someone more competent will assuredly step forward to consume the limited maintenance labour more thoughtfully

dataflow 15 hours ago
> Your being set behind is less important than the fact that your publishing is setting everyone else behind
One hallucinated citation does not in any way imply anyone is being left behind. All it means nobody is checked that particular line of the manuscript after it was written. The rest of the paper could still be solid and treated accordingly. If you find evidence of the contrary, of course treat it accordingly, but this is so obviously not that.
- dumpsterdiver 14 hours ago
  
  > One hallucinated citation does not in any way imply anyone is being left behind.
  The parent said “setting” others behind, which refers to lost time.
  Being “left” behind implies a degraded trajectory, which is defined not by time lost, but by the final destination.
  Different but related things (e.g. lost time can indeed affect your final destination, for instance, after growing old correcting a scourge of hallucinated citations - which should have been table stakes all along).
  
  1 reply →
- some_furry 15 hours ago
  
  No. It's fraud.
- andrepd 15 hours ago
  
  You clearly misunderstand. You cite a work in your paper because you have read that work, and build upon it or want to refer to it to back up a specific claim. Generating references is fraud period, because you are implying that you have read a work when in fact you just asked an AI "please insert some reference-shaped text here" to make it look like a proper paper. It is sadly not a necessary, but certainly a VERY sufficient, reason to conclude a paper is fraudulent.

conartist6 15 hours ago

Yes, it is fraud

Loughla 15 hours ago

Don't use AI? Problem solved?

nkrisc 5 hours ago

It’s easy to avoid this whole issue: write the paper yourself.

themafia 14 hours ago

> There's no need to permanently cripple someone's ability to progress their life or contribute to humanity

I don't think you need to publish on arXive to contribute meaningfully to humanity.

> That's punitive instead of rehabilitative.

Unfortunately science is competitive. Yours is a race to the bottom where the people who can afford the most expensive models and who are least concerned with the truth can publish the most papers and benefit financially and professionally by doing so. This is not a zero sum arena, grant money and opportunities will possibly be rewarded to them, and not to another team who is producing more careful and genuine output.

mianos 15 hours ago

You are being ironic right?

simianparrot 9 hours ago

In science, one hallucinated reference can corrupt the entire rest of the work. So you're completely wrong.

gizajob 4 hours ago

And every piece of work in future which cites the paper with the hallucinated reference.

redsocksfan45 4 hours ago

[dead]