Comment by iamflimflam1
2 years ago
It's true in many experiments. The desire to get the result you want can often overwhelm the need to validate what you are getting.
Especially true when the results confirm any pre-existing thinking you may have.
2 years ago
It's true in many experiments. The desire to get the result you want can often overwhelm the need to validate what you are getting.
Especially true when the results confirm any pre-existing thinking you may have.
One particular example that I remember from an introductory particle physics class is the History Plots section[1] of the biennial review of experimental data.
Knowing these quantities is important, but their particular values largely aren’t; nobody’s funding or career really depended on them being equal to one thing or another. Yet look at all the jumps, where the measurements after the initial very rough ones got stuck in the completely wrong place until the jump to the right value—when it happened—was of a completely implausible magnitude, like four, six, or ten sigma.
[1] https://pdg.lbl.gov/2023/reviews/rpp2022-rev-history-plots.p...
What's also good to see here is that the post '90 numbers usually don't even fall within the error bars of the pre '90 numbers. While uncertainty is great, it isn't the end all. I think a lot of people forget how difficult evaluation actually is. Usually we just look at one or two metrics and judge based on that, but such an evaluation is incredibly naive. Metrics and measures are only guides, they do not provide certainty nor targets.
Yep, confirmation bias. Luckily helped with peer review!
Hasn’t this paper made it through peer review?
Yeah it was published at ACL ( https://aclanthology.org/2023.findings-acl.426/ ) which is one of the most prestigious conferences in NLP. So kinda disappointing.
But paper reviewers are usually not supposed to look at the actual source code of the papers, and definitely don't try to reproduce the results. They just read the paper itself, which of course doesn't talk about the error.
Not sure what the best solution is, other than having the most "hyped" papers double verified by researchers on Twitter.
14 replies →
I suspect GP commenter meant "replication study" rather than "peer review".
;-)
(Peer review doesn't check if your data is correct. They check your data collection methods make sense given the hypothesis you're testing, and that your conclusions are supported by the data you collected.)