The crisis in science can only be fixed by addressing the slew of bad incentives built into the system. We can't predicate job security, promotion and prestige of every early career scientist on publishing as many papers as possible, and on obtaining grants (which requires publishing as many papers as possible) and then expect high-quality science. We can't starve universities of public funding and expect them not to selectively hire scientists whose main skill is publishing hundreds of "exciting" papers, and not overproduce low-quality future "scientists" who were trained in the dark arts of academic survival. Reform is more urgent than ever; AI has essentially obsoleted the mental model that equates the count of published papers with productivity and quality.
I can't say this enough, independent reproduction must be a part of the process or we'll continue seeing this issue. As you say it's the incentives. One solution that's seems reasonably possible for 95+% of research would be to put 30% or so of the research funds locked away, to be then given to another team ideally at another university that get's access only to the original teams' publication and has the goal to reproduce the study. The vast majority of papers released don't contain enough information to actually repeat their work.
And since we are talking about science reform, let's start with the much easier and cheaper preregistration [1] which helps massively with publication bias.
It would be interesting for reproducibility efforts to assess “consequentiality” of failed replications, meaning: how much does it matter that a particular study wasn’t reproducible? Was it a niche study that nobody cited anyway, or was it a pivotal result that many other publications depended on, or anything in between those two extremes?
I would like to think that the truly important papers receive some sort of additional validation before people start to build lives and livelihoods on them, but I’ve also seen some pretty awful citation chains where an initial weak result gets overegged by downstream papers which drop mention of its limitations.
It is an ongoing crisis how much Alzheimer’s research was built on faked amyloid beta data. Potentially billions of dollars from public and private research which might have been spent elsewhere had a competing theory not been overshadowed by the initial fictitious results.
The amyloid hypothesis is still the top candidate for at least a form of Alzheimer's. But yes, the issues with one of the early studies has caused significant issues.
I say "a form of Alzheimer's" because it is likely we are labelling a few different diseases as Alzheimer's.
The median sample size of the studies subjected to replication was n = 5 specimens (https://osf.io/atkd7). Probably because only protocols with an estimated cost less than BRL 5,000 (around USD 1,300 at the time) per replication were included. So it's not surprising that only ~ 60% of the original biomechemical assays' point estimates were in the replicates' 95% prediction interval. The mouse maze anxiety test (~ 10%) seems to be dragging down the average. n = 5 just doesn't give reliable estimates, especially in rodent psychology.
This should be the top comment on HN where most users claim to have some grasp of statistics. N=5 implies a statistical uncertainty of about 45%, so they measured what one would expect, which is essentially nothing. Also this is specifically about Brazilian biomedical studies, and contains no evidence to support people's various personal vendettas against other fields in other countries. At least read the article people.
For central limit theorem to hold, the random variables must be (independently and identically dustributed) i.i.d.
How do we know our samples are i.i.d.? We can only show if they are not
This doesn’t really surprise me at all. It’s an unrelated field, but part of the reason I got completely disillusioned with research to the point I switched out of a program with a thesis was because I started noticing reproducibility problems in published work. My field is CS/CE, generally papers reference publicly available datasets and can be easily replicated… except I kept finding papers with results I couldn’t recreate. It’s possible I made mistakes (what does a college student know, after all), but usually there were other systemic problems on top of reproducibility. A secondary trait I would often notice is a complete exclusion of [easily intuited] counter-facts because they cut into the paper’s claim.
To my mind there is a nasty pressure that exists for some professions/careers, where publishing becomes essential. Because it’s essential, standards are relaxed and barriers lowered, leading to the lower quality work being published. Publishing isn’t done in response to genuine discovery or innovation, it’s done because boxes need to be checked. Publishers won’t change because they benefit from this system, authors won’t change because they’re bound to the system.
All it takes is 14 grad students studying the same thing targeting a 95% confidence interval for, on average, one to stumble upon a 5% case. Factor in publication bias and you get a bunch of junk data.
I think I heard this idea from Freakonomics, but a fix is to propose research to a journal before conducting it and being committed to publication regardless of outcome.
Most pharma / medicine studies are pre-registered now. Sometimes the endpoints change based on what the scientists are seeing, but if they're worth their salt, they still report the original scoped findings as well.
Not familiar with this idea, but this idea is commonly applied for grant applications: only apply for a grant when you finished the thing you promise to work on. Then use the grant money to prototype the next five ideas (of which maybe one works), because science is about exploration.
The state of CS papers is truly awful, as they're uniquely poised to be 100% reproducible. And yet my experience aligns with yours in that they very rarely are.
Even more ridiculous is the number of papers that do not include code. Sure, maybe Google cannot offer an environment to replicate the underlying 1PB dataset, but for mortals, this is rarely a concern.
Even better is when the paper says code will be released after publication, but they cannot be bothered to post it anywhere.
I can second this, even availability of the code is still a problem. However, I would not say CS results are rarely reproducible, at least from the few experineces I had so far, but I heard of problematic cases from others. I guess it also differs between fields.
I want to note there is hope. Contrary to what the root comment says, some publishers try to endorse reproducible results. See for example the ACM reproducibility initiative [1]. I have participated in this before and believe it is a really good initiative. Reproducing results can be very labor intensive though, loading a review system already struggling under massive floods of papers. And it is also not perfect, most of the time it is only ensured that the author-supplied code produces the presented results, but I still think more such initiatives are healthy. When you really want to ensure the rigor of a presented method, you have to replicate it, i.e., using a different programming language or so, which is really its own research endeavor. And there is also a place to publish such results in CS already [2]! (although I haven‘t tried this one). I imagine this may be especially interesting for PhD students just starting out in a new field, as it gives them the opportunity to learn while satisfying the expectation of producing papers.
This same post appears at the top of every single HN story on reproducibility. “I was a student in [totally unrelated field] and found reproducibility to be difficult. I didn’t investigate it deeply and ultimately I left the field, not because I was unsuccessful, of course, but because I understood deeply despite my own extremely limited experience in the area that all of the science was deeply flawed if not false.”
Imagine the guy who got a FAANG job and made it nine weeks in before washing out, informing you how the entire industry doesn’t know how to write code. Maybe they’re right and the industry doesn’t know how to write code! But I want to hear it from the person who actually made a career, not the intern who made it through part of a summer.
The problem is the negative feedback cycle: someone who has spent decades in academia and is highly published, almost by definition alone, has not experienced the pains of industry practitioners.
Their findings are often irrelevant to industry at best and contradictory at worst.
This seems like a straw-man. The stories are much more complex than this (in my experience/opinion), usually directly reporting about immoral acts by peers, lack of support, unfair/inequal treatment, hypocrisy, and so on. The event of the failed reproduction is at best an intermezzo.
Not to mention that we know a lot of overhyped results did fail replication and then powerful figures in academia did their best to pretend that still their thrones were not placed on top of sandcastles.
In my field, trying to reproduce results or conclusions from papers happens on a regular basis especially when the outcome matters for projects in the lab. However, whatever the outcome, it can't be published because either it confirms the previous results and so isn't new or it doesn't and no journal wants to publish negative results. The reproducibility attempts are generally discussed at conferences in the corridors between sessions or at the bar in the evening. This is part of how a scientific consensus is formed in a community.
As part of the larger reproducibility crisis including social science, I wonder how much these things contribute to declining public confidence in science and the post-truth era generally.
pretty crazy reading all this and realizing how shaky some "facts" really are - you think the root problem comes from pressure to publish or is it just sloppy science piling up over time?
Is there any path forward to fixing the current reproducibility crisis in science? Individuals can do better, but that won't solve a problem at this scale. Could we make systemic changes to how papers are validated and approved for publication in major journals?
Pre-registration is a pretty big one: essential you outline your research plan (what you’re looking for, how you will analyze the data, what bars you are setting for significance, etc.) before you do any research. You plan is reviewed and accepted (or denied), often by both funding agency and journal you want to submit to, before they know the results.
Then you perform the experiment exactly* how you said you would based on the pre-registration, and you get to publish your results whether they are positive or negative.
* Changes are allowed, but must be explicitly called out and a valid reason given.
From the perspective of a dishonest researcher, what are the compliance barriers to secretly doing the research work, and only after that doing the pre-registration?
Disclosure: I'm a scientist, specializing in scientific measurement equipment, so of course reproducibility is my livelihood.
But at the same time, I doubt that fields like physics and chemistry had better practices in, say, the 19th century. It would be interesting to conduct a reproducibility project on the empirical studies supporting electromagnetism or thermodynamics. There were probably a lot of crap papers!
Those fields had a backup, which was that studies and theories were interconnected, so that they tended to cross-validate one another. This also meant that individual studies were hot-pluggable. One of them could fail replication and the whole edifice wouldn't suddenly collapse.
My graduate thesis project was never replicated. For one thing, the equipment that I used had been discontinued before I finished, and cost about a million bucks in today's dollars. On the other hand, two labs built similar experiments that were considerably better, made my results obsolete, and enabled further progress. That was a much better use of resources.
I think fixing replication will have to involve fixing more than replication, but thinking about how science progresses as a whole.
Reproducibility studies are costly in time, reagents, and possibly irreplaceable primary samples. I usually would prefer a different study looking at similar mechanisms using different methods than a reproduction of the original methods, although there’s an important place for direct replication studies like this as well. We can also benefit from data sleuths uncovering fraud, better whistleblower systems, and more ability for graduate students to transfer out of toxic labs and into better ones with their funding, reputation and research progress intact.
Scientists have informal trust networks that I’d like to see made explicit. For example, I’d like to see a social media network for scientists where they can PRIVATELY specify trust levels in each other and in specific papers, and subscribe to each others’ trust networks, to get an aggregated private view of how their personal trusted community views specific labs and papers.
> Scientists have informal trust networks that I’d like to see made explicit. For example, I’d like to see a social media network for scientists where they can PRIVATELY specify trust levels in each other and in specific papers, and subscribe to each others’ trust networks, to get an aggregated private view of how their personal trusted community views specific labs and papers.
That sounds fascinating, but I'd have a darned high bar to participate to make sure I wasn't inadertently disclosing my very personal trust settings. Past experiences with intentional or unintentional data deanonymization (or just insufficient anonymization) makes me very wary of such claims.
A dream of mine was that in order to get a PhD, you would not have to publish original research, but instead you would have to _reproduce existing research_. This would bring the PhD student to the state of the art in a different way, and it would create a natural replication process for current research. Your thesis would be about your replication efforts, what was reproducible and what was not, etc.
And then, once you got your PhD, only then you would be expected to publish new, original research.
That used to be the function of undergraduate and Masters theses at the Ivy League universities. "For the undergraduate thesis, fix someone else's mistake. For the Master's thesis, find someone else's mistake. For the PhD thesis, make your own mistake."
Yes, but nobody wants to acknowledge the elephant in the room. Once again, this is why defunding research has gained merit. If more than half of new research is fake, don't protest when plugs are being pulled; You're protesting empirical results.
Science (including all the fake stuff) advanced humanity immensely. I can not imaging that cutting research founding to do less science (with the same percentage of fake) is helpful in any way.
You committed the same sin you are attempting to condemn, while sophomorically claiming it is obvious this sin deserves an intellectual death penalty.
It made me smile. :) Being human is hard!
Now I'm curious, will you acknowledge the elephant in this room? It's hard to, I know, but I have a strong feeling you have a commitment to honesty even if it's hard to always enact all the time. (i.e. being a human is hard :) )
I had always envisioned an institute for reproducibility & Peer review. It would be a federally funded institute that would require Phd candidate participation as an additional requirement to receive your degree. Really it wouldn't be a single place but office or team at each university where proper equipment was available and perhaps similar conditions for reproducing specific research. Of course the feasibility of this is pretty low.
There is an huge amount of pressure to publish publish publish.
So, many researchers prefeer to write very simple things that are probably true or applicative work, which is kind of useful, or publish false/fake results.
May be try to define a "reproducible" h-index, ie your publication doesn't count or count less until a different team has reproduced your results, the team doing the reproducing work gets some points to.
(And may be add more points if in order to reproduce you didn't have to ask plenty of questions to the original team, ie the original paper didn't omit essential information)
I'm curious, I don't get why the down votes? Having to race for publishing pushes people to cheat, It didn't occur to me that it was a bad point, but if you have a different opinion I would gladly hear!
Yeah "individuals do better" is never the answer -- you've got to structure incentives, of course.
I don't think you want to slow down publication (and probably peer review and prestiage journals are useless/obsolete in era of internet); it's already crazy slow.
So let's see: you want people to incentivize two things (1) no false claims in original research (2) to have people try to reproduce claims.
So here's a humble proposal for a funding source (say...the govt): set aside a pot of money specifically for people to try to reproduce research; let this be a valid career path. Your goal should try to be getting research validated by repro before OTHER research starts to build on those premises (avoiding having the whole field go off on wild goose chases like happened w/ Alzheheimer's). And then, when results DON'T repro, blackball the original researchers from funding. (With whatever sort of due process is needed to make this reasonable.)
Punishing researchers who make mistakes or get unlucky due to noise in the data is a recipe for disaster, just like in other fields. The ideal amount of fraud and false claims in research is not zero, because the policing effort it would take to accomplish this goal would destroy all other forms of value. I can't emphasize enough how bad an idea blackballing researchers for publishing irreproducible results would be.
We have money to fund direct reproducibility studies (this one is an example), and indirect replication by applying othogonal methods to similar research topics can be more powerful than direct replication.
On the data analysis side, I think making version control both mandatory and automatic would go a long way.
One issue is that internal science within a company/lab can move incredibly fast -- assays, protocols, datasets and algorithms change often. People tend to lose track of what data, what parameters, and what code they used to arrive at a particular figure or conclusion. Inevitably, some of those end up being published.
Journals requiring data and code for publication helps, but it's usually just one step at the end of a LONG research process. And as far as I'm aware, no one actually verifies that the code you submitted produces the figures in your paper.
It's a big reason why we started https://GoFigr.io. I think making reproducibility both real-time and automatic is key to make this situation better.
There's usually indirect reproduction. For instance I can take some principle from a study and integrate it into something else. The real issue is that if the result is negative - at least from my understanding - the likelihood of publication is minimal, so it isn't communicated. And if the principle I've taken was at fault there's a lot of space for misattribution, I could blame a litany of different confounders for failures until, after some long while I might decide to place blame on the principle itself. That itself may require a complete rework of any potential paper, redoing all the experiments (depending on how anal one is in data collection).
Just open up a comment section for institutional affiliates.
Yes. Accepting the uncertainty and publishing more than few.
Often famous/more cited studies are not replicable. But if you want to work on similar research problem and publish null/non exciting results, you're up for a fight. Journals want new, fun, exciting results but unfortunately the world doesn't work that way
Yes, but it costs money. There's no solution that wouldn't.
IMO, the best way forward would be simply doubling every study with independent researchers (ideally they shouldn't have contact with each other beyond the protocol). That certainly doubles the costs, but it's really just about the only way to catch bad actors early.
> Yes, but it costs money. There's no solution that wouldn't.
True, although, as you doubtless know, as with most things that cost money, the alternative also costs money (for example, in funding experiments chasing after worthless science). It's just that we tend to set aside the costs that we have already priced in. So I tend to think in such settings that a useful approach might be to see how we can make such costs more visible, to increase the will to address them.
I follow Vinay Prasad (https://substack.com/@vinayprasadmdmph) to keep up on these topics. It feels like getting a portal to the future in some way as he's on the cutting edge of analyzing the quality of the analysis in a ton of papers. You get to see what conclusions are likely to change in the next handful of years as the information becomes more widespread.
I’ve not got the context on why Brazil was chosen here (paywall) - but I coincidentally read a story on here of Richard Feynman visiting Brazil whereby he assessed their teaching and tried to impart his teaching and learning techniques.
The answer is straightforward. They are a coalition of Brazilian labs (click on the link in the first sentence to get more information) so it seems normal that they would be focusing on the research conducted in their country. Also it is not the first research of its kind as the nature article provides context
> The teams were able to replicate the results of less than half of the tested experiments1. That rate is in keeping with that found by other large-scale attempts to reproduce scientific findings. But the latest work is unique in focusing on papers that use specific methods and in examining the research output of a specific country, according to the research teams.
A lot of people have pointed out a reproducibility crisis in social sciences, but I think it's interesting to point out this happens in CompSci as well when verifying results is hard.
Reproducing ML Robotics papers requires the exact robot/environment/objects/etc -> people fudge their numbers and have strawman implementation of benchmarks.
LLMs are so expensive to train + the datasets are non-public -> Meta trained on the test set for Llama4 (and we wouldn't have known if not for some forum leak).
In some way it's no different than startups or salesmen overpromising - it's just lying for personal gain. The truth usually wins in the end though.
And all the drugs and treatments derived from those "studies" are going to continue to be prescribed for another couple of decades, much like they were cutting people up to "cure ulcers" long after it was proven that an antibiotic is all you really need to cure it. It took about a decade for that bulletproof, 100% reproducible study to make much of a difference in the field.
Are you one of those people who somehow believe that, because the pop culture "chemical imbalance" ideology was never factual, SSRIs don't work.
They are continually prescribed because their actual mechanism doesn't matter, they demonstrably work. That is a matter of statistics, not science.
Anti-science types always point to the same EXTREMELY FEW examples of how science "fails", like Galileo (which had nothing to do with science) and ulcers.
They never seem to point to the much more common examples where people became convinced of something scientifically untrue for decades despite plenty of evidence otherwise. The British recognized a link between citrus and scurvy well before they were even called "Limeys"! They then screwed themselves over by changing some variables (cooking lime juice) and instead turned to a quack ("respected doctor" from a time when most people recognized doctors were worse than the sickness they treated) who insisted on alternative treatment. For about a hundred years, British sailors suffered and died due to one quacks ego.
Phrenology was always, from day one, unscientific. You STILL find morons pushing it's claims, using it to justify their godawful, hateful, and murderous world views.
Ivermectin is a great example, since you can create a "study" in Africa to show Ivermectin cures anything you want, because it is a parasite killer and most people in impoverished areas suffer from parasites, so will improve if they take it. It's entirely unrelated to the illness you claim to treat, but nobody on Facebook will ever understand that, because they tuned out science education decades ago.
How many people have died from alternative medicine quacks pushing outright disproven pseudoscience on people who have been told not to trust scientists by people pushing an agenda?
How much money is made selling sugarpills to idiots who have been told to distrust science, not just "be skeptical of any paper" but outright, scientists are in a conspiracy to lie to you!
SSRIs may work, but the science isn't settled that they work better than a placebo: https://bmjopen.bmj.com/content/9/6/e024886.full . And they come with side effects like sexual dysfunction that other treatments (like therapy) don't face.
Yet again more people in this site equating "failed to reproduce" with "the original study can't possibly be correct and is probably fraudulent"
That's not how it works. Science is hard, experiment design is hard, and a failure to reproduce could mean a bunch of different things. It could mean the original research failed to mention something critical, or you had a fluke, or you didn't understand the process right, or something about YOUR setup is unknowingly different. Or the process itself is somewhat stochastic.
This goes 10X for such difficult sciences as psychology (which is literally still in infancy) and biology. In these fields, designing a proper experiment (controlling as much as you can) is basically impossible, so we have to tease signal out of noise and it's failure prone.
Hell, go watch Youtube Chemists who have Phds fail to reproduce old papers. Were those papers fraudulent? No, science is just difficult and failure prone.
If you treat "Paper published in Nature/Science" as a source of truth, you will regularly be wrong. Scientists do not do that. Nature is a magazine, and is a business, and sees themselves as trying to push the cutting edge of research, and they will happily publish an outright fraudulent paper if there is even the slightest chance it might be valid, and especially if it would be really cool if it's right.
When discussing how Jan Hendrik Schön got tens of outright fraudulent papers into Nature despite nobody being able to even confirm he ran any experiments, they said that "even false papers can push the field forward". One of the scientists who investigated and helped Schon get fired even said that peer review is no indicator of quality or correctness. Peer review wasn't even a formal part of science publishing until the 60s.
Science is "self correcting" because if the "effect" you saw isn't real, nobody will be able to build off your work. Alzheimer's Amyloid research has been really unproductive, which is how we knew it probably wasn't the magic bullet even before it had fraud scandals.
If you doubt this, look to China. They have ENORMOUS amounts of explicit fraud in their system, as well as a MUCH WORSE "publish or perish" state. Would you suggest it has slowed them down?
Stop trying to outsource your critical thinking to an authority. You cannot do science without publishing wrong or false papers. If you are reading about "science" in a news article, press release, or advertisement, you don't know science. I am continually flabbergasted by how often "Computer Scientists" don't even know the basics of the scientific method.
Scientists understood there was a strong link between cigarettes and cancer at least 20 years before we had comprehensive scientific studies to "prove" it.
That said, there are good things to do to mitigate the harms that "publish or perish" causes, like preregistration and an incentive to publish failed experiments, even though science progressed pretty well for 400 years without them. These reproducibility projects are great, but do not mistake their "these papers failed" as "these papers were written fraudulently, or by bad scientists, or were a waste".
Good programmers WILL ship bugs sometimes. Good scientists WILL publish papers that don't pan out. These are truths of human processes and imperfect systems.
"This goes 10X for such difficult sciences as psychology (which is literally still in infancy) and biology. In these fields, designing a proper experiment (controlling as much as you can) is basically impossible, so we have to tease signal out of noise and it's failure prone."
For psychology replace "Difficult" with "Pseudo".
To lose that tag, Psychology has to take a step back, do basic research, replicate that research multiple times, think about how to do replicatable new research, and only then start actually letting psychologists do new research to advance science.
Instead of that, unreplicated pseudo-scientific nonsense psychology papers are being used to tell governments how to force us to live our lives.
> Hell, go watch Youtube Chemists who have Phds fail to reproduce old papers. Were those papers fraudulent? No, science is just difficult and failure prone.
Agreed. Lab technique is a thing. There is a reason for the dark joke that in Physics, theorists are washed up by age 30, but experimentalists aren't even competent until age 40.
At the end of the day, people most trust results that validate in multiple datasets. No one really cherry picks one thing and builds off of that or they get slammed in peer review until they come back with sufficient evidence in the literature or through novel experiments.
A lot of things, in fact, do work. Hence, modern science producing so much despite this reproducibility crisis being even worse in decades past.
Academia is 90% a scam these days and plenty of the professors involved are criminals. A criminal is someone who commits a crime (or many) [1], before some purist comes to ask "what do you mean?".
The most common crime they commit is fraud, the 2nd. most common one is sexual harassment, while the third one would be plagiarism, although this one might not necessarily be punishable depending on the jurisdiction.
(IMO. I can't provide data on that and I'm not willing to prosecute them personally, if that breaks the deal for you, that's ok to me.)
I know academia like the palm of my hand and have been everywhere around the world, it's the same thing all over. I can speak loudly about it because I'm catholic and have money, so those lowlives can't touch me :D.
Every single time this topic comes up, there's a lot of resistance from "the public" who is willing to go to great lengths to defend "the academics" even though they know absolutely nothing about academic life and their only grasp of it was created through TV and movies.
Anyone who has been involved in Academia for more than like 2 years can tell you the exact same thing. That doesn't mean they're also rotten, I'm just saying they've seen all these things taking place around.
We should really move the overton window around this topic so that scientists are held to the same public scrutiny as everybody else, like public officials, because btw. 9 out of 10 times they are being funded by public money. They should be held accountable, there should be jail for the offenders.
https://archive.is/mmzWj
The crisis in science can only be fixed by addressing the slew of bad incentives built into the system. We can't predicate job security, promotion and prestige of every early career scientist on publishing as many papers as possible, and on obtaining grants (which requires publishing as many papers as possible) and then expect high-quality science. We can't starve universities of public funding and expect them not to selectively hire scientists whose main skill is publishing hundreds of "exciting" papers, and not overproduce low-quality future "scientists" who were trained in the dark arts of academic survival. Reform is more urgent than ever; AI has essentially obsoleted the mental model that equates the count of published papers with productivity and quality.
I can't say this enough, independent reproduction must be a part of the process or we'll continue seeing this issue. As you say it's the incentives. One solution that's seems reasonably possible for 95+% of research would be to put 30% or so of the research funds locked away, to be then given to another team ideally at another university that get's access only to the original teams' publication and has the goal to reproduce the study. The vast majority of papers released don't contain enough information to actually repeat their work.
And since we are talking about science reform, let's start with the much easier and cheaper preregistration [1] which helps massively with publication bias.
[1] https://en.wikipedia.org/wiki/Preregistration_(science)
It would be interesting for reproducibility efforts to assess “consequentiality” of failed replications, meaning: how much does it matter that a particular study wasn’t reproducible? Was it a niche study that nobody cited anyway, or was it a pivotal result that many other publications depended on, or anything in between those two extremes?
I would like to think that the truly important papers receive some sort of additional validation before people start to build lives and livelihoods on them, but I’ve also seen some pretty awful citation chains where an initial weak result gets overegged by downstream papers which drop mention of its limitations.
It is an ongoing crisis how much Alzheimer’s research was built on faked amyloid beta data. Potentially billions of dollars from public and private research which might have been spent elsewhere had a competing theory not been overshadowed by the initial fictitious results.
The amyloid hypothesis is still the top candidate for at least a form of Alzheimer's. But yes, the issues with one of the early studies has caused significant issues.
I say "a form of Alzheimer's" because it is likely we are labelling a few different diseases as Alzheimer's.
I went searching for more info on this and found https://www.science.org/content/blog-post/faked-beta-amyloid... which was an interesting read.
Those studies were all run and paid for, many/most with public funding. Of course it matters.
Reproducing a paper is Hard, and also Expensive. I'd expect that they wouldn't pick papers to try and reproduce at random.
The median sample size of the studies subjected to replication was n = 5 specimens (https://osf.io/atkd7). Probably because only protocols with an estimated cost less than BRL 5,000 (around USD 1,300 at the time) per replication were included. So it's not surprising that only ~ 60% of the original biomechemical assays' point estimates were in the replicates' 95% prediction interval. The mouse maze anxiety test (~ 10%) seems to be dragging down the average. n = 5 just doesn't give reliable estimates, especially in rodent psychology.
This should be the top comment on HN where most users claim to have some grasp of statistics. N=5 implies a statistical uncertainty of about 45%, so they measured what one would expect, which is essentially nothing. Also this is specifically about Brazilian biomedical studies, and contains no evidence to support people's various personal vendettas against other fields in other countries. At least read the article people.
Glad this is getting some attention
For central limit theorem to hold, the random variables must be (independently and identically dustributed) i.i.d. How do we know our samples are i.i.d.? We can only show if they are not
Add to that https://en.m.wikipedia.org/wiki/Why_Most_Published_Research_...
We've got to do better or science will stagnate
another funny portal to hell - https://www.youtube.com/@rivervalleytv/videos
This doesn’t really surprise me at all. It’s an unrelated field, but part of the reason I got completely disillusioned with research to the point I switched out of a program with a thesis was because I started noticing reproducibility problems in published work. My field is CS/CE, generally papers reference publicly available datasets and can be easily replicated… except I kept finding papers with results I couldn’t recreate. It’s possible I made mistakes (what does a college student know, after all), but usually there were other systemic problems on top of reproducibility. A secondary trait I would often notice is a complete exclusion of [easily intuited] counter-facts because they cut into the paper’s claim.
To my mind there is a nasty pressure that exists for some professions/careers, where publishing becomes essential. Because it’s essential, standards are relaxed and barriers lowered, leading to the lower quality work being published. Publishing isn’t done in response to genuine discovery or innovation, it’s done because boxes need to be checked. Publishers won’t change because they benefit from this system, authors won’t change because they’re bound to the system.
All it takes is 14 grad students studying the same thing targeting a 95% confidence interval for, on average, one to stumble upon a 5% case. Factor in publication bias and you get a bunch of junk data.
I think I heard this idea from Freakonomics, but a fix is to propose research to a journal before conducting it and being committed to publication regardless of outcome.
A great idea. Also known as a pre registered study.
https://en.m.wikipedia.org/wiki/Preregistration_(science)
Most pharma / medicine studies are pre-registered now. Sometimes the endpoints change based on what the scientists are seeing, but if they're worth their salt, they still report the original scoped findings as well.
Not familiar with this idea, but this idea is commonly applied for grant applications: only apply for a grant when you finished the thing you promise to work on. Then use the grant money to prototype the next five ideas (of which maybe one works), because science is about exploration.
> A secondary trait I would often notice is a complete exclusion of [easily intuited] counter-facts because they cut into the paper’s claim.
It's lack of industry experience. I complained about this is a recent comment here: https://news.ycombinator.com/item?id=43769856
Basically, in SE anyway, the largest number of publications are authored by new graduates.
Think about how clueless the new MSc or PhD graduate is when they join your team: thesebare the majority of authors.
The system is set up to incentivise the wrong thing.
The state of CS papers is truly awful, as they're uniquely poised to be 100% reproducible. And yet my experience aligns with yours in that they very rarely are.
Even more ridiculous is the number of papers that do not include code. Sure, maybe Google cannot offer an environment to replicate the underlying 1PB dataset, but for mortals, this is rarely a concern.
Even better is when the paper says code will be released after publication, but they cannot be bothered to post it anywhere.
I can second this, even availability of the code is still a problem. However, I would not say CS results are rarely reproducible, at least from the few experineces I had so far, but I heard of problematic cases from others. I guess it also differs between fields.
I want to note there is hope. Contrary to what the root comment says, some publishers try to endorse reproducible results. See for example the ACM reproducibility initiative [1]. I have participated in this before and believe it is a really good initiative. Reproducing results can be very labor intensive though, loading a review system already struggling under massive floods of papers. And it is also not perfect, most of the time it is only ensured that the author-supplied code produces the presented results, but I still think more such initiatives are healthy. When you really want to ensure the rigor of a presented method, you have to replicate it, i.e., using a different programming language or so, which is really its own research endeavor. And there is also a place to publish such results in CS already [2]! (although I haven‘t tried this one). I imagine this may be especially interesting for PhD students just starting out in a new field, as it gives them the opportunity to learn while satisfying the expectation of producing papers.
[1] https://www.acm.org/publications/policies/artifact-review-an... [2] https://rescience.github.io
This same post appears at the top of every single HN story on reproducibility. “I was a student in [totally unrelated field] and found reproducibility to be difficult. I didn’t investigate it deeply and ultimately I left the field, not because I was unsuccessful, of course, but because I understood deeply despite my own extremely limited experience in the area that all of the science was deeply flawed if not false.”
Imagine the guy who got a FAANG job and made it nine weeks in before washing out, informing you how the entire industry doesn’t know how to write code. Maybe they’re right and the industry doesn’t know how to write code! But I want to hear it from the person who actually made a career, not the intern who made it through part of a summer.
The problem is the negative feedback cycle: someone who has spent decades in academia and is highly published, almost by definition alone, has not experienced the pains of industry practitioners.
Their findings are often irrelevant to industry at best and contradictory at worst.
Of course I'm talking almost solely about SE.
This seems like a straw-man. The stories are much more complex than this (in my experience/opinion), usually directly reporting about immoral acts by peers, lack of support, unfair/inequal treatment, hypocrisy, and so on. The event of the failed reproduction is at best an intermezzo.
Not to mention that we know a lot of overhyped results did fail replication and then powerful figures in academia did their best to pretend that still their thrones were not placed on top of sandcastles.
In my field, trying to reproduce results or conclusions from papers happens on a regular basis especially when the outcome matters for projects in the lab. However, whatever the outcome, it can't be published because either it confirms the previous results and so isn't new or it doesn't and no journal wants to publish negative results. The reproducibility attempts are generally discussed at conferences in the corridors between sessions or at the bar in the evening. This is part of how a scientific consensus is formed in a community.
Care to share which field is this?
cell/molecular biology
As part of the larger reproducibility crisis including social science, I wonder how much these things contribute to declining public confidence in science and the post-truth era generally.
pretty crazy reading all this and realizing how shaky some "facts" really are - you think the root problem comes from pressure to publish or is it just sloppy science piling up over time?
Is there any path forward to fixing the current reproducibility crisis in science? Individuals can do better, but that won't solve a problem at this scale. Could we make systemic changes to how papers are validated and approved for publication in major journals?
Pre-registration is a pretty big one: essential you outline your research plan (what you’re looking for, how you will analyze the data, what bars you are setting for significance, etc.) before you do any research. You plan is reviewed and accepted (or denied), often by both funding agency and journal you want to submit to, before they know the results.
Then you perform the experiment exactly* how you said you would based on the pre-registration, and you get to publish your results whether they are positive or negative.
* Changes are allowed, but must be explicitly called out and a valid reason given.
https://en.wikipedia.org/wiki/Preregistration_(science)
Wow, I didn't think it's possible, but it sounds like a great way to make research boring :).
5 replies →
From the perspective of a dishonest researcher, what are the compliance barriers to secretly doing the research work, and only after that doing the pre-registration?
3 replies →
Disclosure: I'm a scientist, specializing in scientific measurement equipment, so of course reproducibility is my livelihood.
But at the same time, I doubt that fields like physics and chemistry had better practices in, say, the 19th century. It would be interesting to conduct a reproducibility project on the empirical studies supporting electromagnetism or thermodynamics. There were probably a lot of crap papers!
Those fields had a backup, which was that studies and theories were interconnected, so that they tended to cross-validate one another. This also meant that individual studies were hot-pluggable. One of them could fail replication and the whole edifice wouldn't suddenly collapse.
My graduate thesis project was never replicated. For one thing, the equipment that I used had been discontinued before I finished, and cost about a million bucks in today's dollars. On the other hand, two labs built similar experiments that were considerably better, made my results obsolete, and enabled further progress. That was a much better use of resources.
I think fixing replication will have to involve fixing more than replication, but thinking about how science progresses as a whole.
Reproducibility studies are costly in time, reagents, and possibly irreplaceable primary samples. I usually would prefer a different study looking at similar mechanisms using different methods than a reproduction of the original methods, although there’s an important place for direct replication studies like this as well. We can also benefit from data sleuths uncovering fraud, better whistleblower systems, and more ability for graduate students to transfer out of toxic labs and into better ones with their funding, reputation and research progress intact.
Scientists have informal trust networks that I’d like to see made explicit. For example, I’d like to see a social media network for scientists where they can PRIVATELY specify trust levels in each other and in specific papers, and subscribe to each others’ trust networks, to get an aggregated private view of how their personal trusted community views specific labs and papers.
> Scientists have informal trust networks that I’d like to see made explicit. For example, I’d like to see a social media network for scientists where they can PRIVATELY specify trust levels in each other and in specific papers, and subscribe to each others’ trust networks, to get an aggregated private view of how their personal trusted community views specific labs and papers.
That sounds fascinating, but I'd have a darned high bar to participate to make sure I wasn't inadertently disclosing my very personal trust settings. Past experiences with intentional or unintentional data deanonymization (or just insufficient anonymization) makes me very wary of such claims.
A dream of mine was that in order to get a PhD, you would not have to publish original research, but instead you would have to _reproduce existing research_. This would bring the PhD student to the state of the art in a different way, and it would create a natural replication process for current research. Your thesis would be about your replication efforts, what was reproducible and what was not, etc.
And then, once you got your PhD, only then you would be expected to publish new, original research.
That used to be the function of undergraduate and Masters theses at the Ivy League universities. "For the undergraduate thesis, fix someone else's mistake. For the Master's thesis, find someone else's mistake. For the PhD thesis, make your own mistake."
Well, in some fields some PhD classes involve a lot of reproducing (at least partially) others‘ papers.
Yes, but nobody wants to acknowledge the elephant in the room. Once again, this is why defunding research has gained merit. If more than half of new research is fake, don't protest when plugs are being pulled; You're protesting empirical results.
Science (including all the fake stuff) advanced humanity immensely. I can not imaging that cutting research founding to do less science (with the same percentage of fake) is helpful in any way.
> more than half of new research is fake
You committed the same sin you are attempting to condemn, while sophomorically claiming it is obvious this sin deserves an intellectual death penalty.
It made me smile. :) Being human is hard!
Now I'm curious, will you acknowledge the elephant in this room? It's hard to, I know, but I have a strong feeling you have a commitment to honesty even if it's hard to always enact all the time. (i.e. being a human is hard :) )
I had always envisioned an institute for reproducibility & Peer review. It would be a federally funded institute that would require Phd candidate participation as an additional requirement to receive your degree. Really it wouldn't be a single place but office or team at each university where proper equipment was available and perhaps similar conditions for reproducing specific research. Of course the feasibility of this is pretty low.
IMO, stopping the race toward better h index.
There is an huge amount of pressure to publish publish publish.
So, many researchers prefeer to write very simple things that are probably true or applicative work, which is kind of useful, or publish false/fake results.
May be try to define a "reproducible" h-index, ie your publication doesn't count or count less until a different team has reproduced your results, the team doing the reproducing work gets some points to.
(And may be add more points if in order to reproduce you didn't have to ask plenty of questions to the original team, ie the original paper didn't omit essential information)
1 reply →
I'm curious, I don't get why the down votes? Having to race for publishing pushes people to cheat, It didn't occur to me that it was a bad point, but if you have a different opinion I would gladly hear!
1 reply →
Yeah "individuals do better" is never the answer -- you've got to structure incentives, of course.
I don't think you want to slow down publication (and probably peer review and prestiage journals are useless/obsolete in era of internet); it's already crazy slow.
So let's see: you want people to incentivize two things (1) no false claims in original research (2) to have people try to reproduce claims.
So here's a humble proposal for a funding source (say...the govt): set aside a pot of money specifically for people to try to reproduce research; let this be a valid career path. Your goal should try to be getting research validated by repro before OTHER research starts to build on those premises (avoiding having the whole field go off on wild goose chases like happened w/ Alzheheimer's). And then, when results DON'T repro, blackball the original researchers from funding. (With whatever sort of due process is needed to make this reasonable.)
I think it'd sort things out.
Punishing researchers who make mistakes or get unlucky due to noise in the data is a recipe for disaster, just like in other fields. The ideal amount of fraud and false claims in research is not zero, because the policing effort it would take to accomplish this goal would destroy all other forms of value. I can't emphasize enough how bad an idea blackballing researchers for publishing irreproducible results would be.
We have money to fund direct reproducibility studies (this one is an example), and indirect replication by applying othogonal methods to similar research topics can be more powerful than direct replication.
10 replies →
On the data analysis side, I think making version control both mandatory and automatic would go a long way.
One issue is that internal science within a company/lab can move incredibly fast -- assays, protocols, datasets and algorithms change often. People tend to lose track of what data, what parameters, and what code they used to arrive at a particular figure or conclusion. Inevitably, some of those end up being published.
Journals requiring data and code for publication helps, but it's usually just one step at the end of a LONG research process. And as far as I'm aware, no one actually verifies that the code you submitted produces the figures in your paper.
It's a big reason why we started https://GoFigr.io. I think making reproducibility both real-time and automatic is key to make this situation better.
There's usually indirect reproduction. For instance I can take some principle from a study and integrate it into something else. The real issue is that if the result is negative - at least from my understanding - the likelihood of publication is minimal, so it isn't communicated. And if the principle I've taken was at fault there's a lot of space for misattribution, I could blame a litany of different confounders for failures until, after some long while I might decide to place blame on the principle itself. That itself may require a complete rework of any potential paper, redoing all the experiments (depending on how anal one is in data collection).
Just open up a comment section for institutional affiliates.
Yes. Accepting the uncertainty and publishing more than few.
Often famous/more cited studies are not replicable. But if you want to work on similar research problem and publish null/non exciting results, you're up for a fight. Journals want new, fun, exciting results but unfortunately the world doesn't work that way
Yes, but it costs money. There's no solution that wouldn't.
IMO, the best way forward would be simply doubling every study with independent researchers (ideally they shouldn't have contact with each other beyond the protocol). That certainly doubles the costs, but it's really just about the only way to catch bad actors early.
> Yes, but it costs money. There's no solution that wouldn't.
True, although, as you doubtless know, as with most things that cost money, the alternative also costs money (for example, in funding experiments chasing after worthless science). It's just that we tend to set aside the costs that we have already priced in. So I tend to think in such settings that a useful approach might be to see how we can make such costs more visible, to increase the will to address them.
2 replies →
I follow Vinay Prasad (https://substack.com/@vinayprasadmdmph) to keep up on these topics. It feels like getting a portal to the future in some way as he's on the cutting edge of analyzing the quality of the analysis in a ton of papers. You get to see what conclusions are likely to change in the next handful of years as the information becomes more widespread.
Second this. I have learned so much from this guy over the last five years. He is an expert at reading between the lines of studies
In addition to his sub stack, his Twitter is great and very accessible
"Trust the science"
I’ve not got the context on why Brazil was chosen here (paywall) - but I coincidentally read a story on here of Richard Feynman visiting Brazil whereby he assessed their teaching and tried to impart his teaching and learning techniques.
The answer is straightforward. They are a coalition of Brazilian labs (click on the link in the first sentence to get more information) so it seems normal that they would be focusing on the research conducted in their country. Also it is not the first research of its kind as the nature article provides context
> The teams were able to replicate the results of less than half of the tested experiments1. That rate is in keeping with that found by other large-scale attempts to reproduce scientific findings. But the latest work is unique in focusing on papers that use specific methods and in examining the research output of a specific country, according to the research teams.
A lot of people have pointed out a reproducibility crisis in social sciences, but I think it's interesting to point out this happens in CompSci as well when verifying results is hard.
Reproducing ML Robotics papers requires the exact robot/environment/objects/etc -> people fudge their numbers and have strawman implementation of benchmarks.
LLMs are so expensive to train + the datasets are non-public -> Meta trained on the test set for Llama4 (and we wouldn't have known if not for some forum leak).
In some way it's no different than startups or salesmen overpromising - it's just lying for personal gain. The truth usually wins in the end though.
And all the drugs and treatments derived from those "studies" are going to continue to be prescribed for another couple of decades, much like they were cutting people up to "cure ulcers" long after it was proven that an antibiotic is all you really need to cure it. It took about a decade for that bulletproof, 100% reproducible study to make much of a difference in the field.
Are you one of those people who somehow believe that, because the pop culture "chemical imbalance" ideology was never factual, SSRIs don't work.
They are continually prescribed because their actual mechanism doesn't matter, they demonstrably work. That is a matter of statistics, not science.
Anti-science types always point to the same EXTREMELY FEW examples of how science "fails", like Galileo (which had nothing to do with science) and ulcers.
They never seem to point to the much more common examples where people became convinced of something scientifically untrue for decades despite plenty of evidence otherwise. The British recognized a link between citrus and scurvy well before they were even called "Limeys"! They then screwed themselves over by changing some variables (cooking lime juice) and instead turned to a quack ("respected doctor" from a time when most people recognized doctors were worse than the sickness they treated) who insisted on alternative treatment. For about a hundred years, British sailors suffered and died due to one quacks ego.
Phrenology was always, from day one, unscientific. You STILL find morons pushing it's claims, using it to justify their godawful, hateful, and murderous world views.
Ivermectin is a great example, since you can create a "study" in Africa to show Ivermectin cures anything you want, because it is a parasite killer and most people in impoverished areas suffer from parasites, so will improve if they take it. It's entirely unrelated to the illness you claim to treat, but nobody on Facebook will ever understand that, because they tuned out science education decades ago.
How many people have died from alternative medicine quacks pushing outright disproven pseudoscience on people who have been told not to trust scientists by people pushing an agenda?
How much money is made selling sugarpills to idiots who have been told to distrust science, not just "be skeptical of any paper" but outright, scientists are in a conspiracy to lie to you!
SSRIs may work, but the science isn't settled that they work better than a placebo: https://bmjopen.bmj.com/content/9/6/e024886.full . And they come with side effects like sexual dysfunction that other treatments (like therapy) don't face.
1 reply →
Yet again more people in this site equating "failed to reproduce" with "the original study can't possibly be correct and is probably fraudulent"
That's not how it works. Science is hard, experiment design is hard, and a failure to reproduce could mean a bunch of different things. It could mean the original research failed to mention something critical, or you had a fluke, or you didn't understand the process right, or something about YOUR setup is unknowingly different. Or the process itself is somewhat stochastic.
This goes 10X for such difficult sciences as psychology (which is literally still in infancy) and biology. In these fields, designing a proper experiment (controlling as much as you can) is basically impossible, so we have to tease signal out of noise and it's failure prone.
Hell, go watch Youtube Chemists who have Phds fail to reproduce old papers. Were those papers fraudulent? No, science is just difficult and failure prone.
If you treat "Paper published in Nature/Science" as a source of truth, you will regularly be wrong. Scientists do not do that. Nature is a magazine, and is a business, and sees themselves as trying to push the cutting edge of research, and they will happily publish an outright fraudulent paper if there is even the slightest chance it might be valid, and especially if it would be really cool if it's right.
When discussing how Jan Hendrik Schön got tens of outright fraudulent papers into Nature despite nobody being able to even confirm he ran any experiments, they said that "even false papers can push the field forward". One of the scientists who investigated and helped Schon get fired even said that peer review is no indicator of quality or correctness. Peer review wasn't even a formal part of science publishing until the 60s.
Science is "self correcting" because if the "effect" you saw isn't real, nobody will be able to build off your work. Alzheimer's Amyloid research has been really unproductive, which is how we knew it probably wasn't the magic bullet even before it had fraud scandals.
If you doubt this, look to China. They have ENORMOUS amounts of explicit fraud in their system, as well as a MUCH WORSE "publish or perish" state. Would you suggest it has slowed them down?
Stop trying to outsource your critical thinking to an authority. You cannot do science without publishing wrong or false papers. If you are reading about "science" in a news article, press release, or advertisement, you don't know science. I am continually flabbergasted by how often "Computer Scientists" don't even know the basics of the scientific method.
Scientists understood there was a strong link between cigarettes and cancer at least 20 years before we had comprehensive scientific studies to "prove" it.
That said, there are good things to do to mitigate the harms that "publish or perish" causes, like preregistration and an incentive to publish failed experiments, even though science progressed pretty well for 400 years without them. These reproducibility projects are great, but do not mistake their "these papers failed" as "these papers were written fraudulently, or by bad scientists, or were a waste".
Good programmers WILL ship bugs sometimes. Good scientists WILL publish papers that don't pan out. These are truths of human processes and imperfect systems.
"This goes 10X for such difficult sciences as psychology (which is literally still in infancy) and biology. In these fields, designing a proper experiment (controlling as much as you can) is basically impossible, so we have to tease signal out of noise and it's failure prone."
For psychology replace "Difficult" with "Pseudo".
To lose that tag, Psychology has to take a step back, do basic research, replicate that research multiple times, think about how to do replicatable new research, and only then start actually letting psychologists do new research to advance science.
Instead of that, unreplicated pseudo-scientific nonsense psychology papers are being used to tell governments how to force us to live our lives.
> Hell, go watch Youtube Chemists who have Phds fail to reproduce old papers. Were those papers fraudulent? No, science is just difficult and failure prone.
Agreed. Lab technique is a thing. There is a reason for the dark joke that in Physics, theorists are washed up by age 30, but experimentalists aren't even competent until age 40.
At the end of the day, people most trust results that validate in multiple datasets. No one really cherry picks one thing and builds off of that or they get slammed in peer review until they come back with sufficient evidence in the literature or through novel experiments.
A lot of things, in fact, do work. Hence, modern science producing so much despite this reproducibility crisis being even worse in decades past.
[dead]
Academia is 90% a scam these days and plenty of the professors involved are criminals. A criminal is someone who commits a crime (or many) [1], before some purist comes to ask "what do you mean?".
The most common crime they commit is fraud, the 2nd. most common one is sexual harassment, while the third one would be plagiarism, although this one might not necessarily be punishable depending on the jurisdiction.
(IMO. I can't provide data on that and I'm not willing to prosecute them personally, if that breaks the deal for you, that's ok to me.)
I know academia like the palm of my hand and have been everywhere around the world, it's the same thing all over. I can speak loudly about it because I'm catholic and have money, so those lowlives can't touch me :D.
Every single time this topic comes up, there's a lot of resistance from "the public" who is willing to go to great lengths to defend "the academics" even though they know absolutely nothing about academic life and their only grasp of it was created through TV and movies.
Anyone who has been involved in Academia for more than like 2 years can tell you the exact same thing. That doesn't mean they're also rotten, I'm just saying they've seen all these things taking place around.
We should really move the overton window around this topic so that scientists are held to the same public scrutiny as everybody else, like public officials, because btw. 9 out of 10 times they are being funded by public money. They should be held accountable, there should be jail for the offenders.
1: https://dictionary.cambridge.org/dictionary/english/criminal
If they had just used NixOS, reproducibility would be less of a problem!
I find it bizarre that people find this problematic.
Even Einstein tried to find flaws in his own theories. This is how science should actually work.
We need to actively try and falsify theories and beliefs. Only if we fail to falsify, the theories should be considered valid.
If scientific studies aren’t reproducible with the reported confidence, it fails as science.
It would be worse if the experiments were not even falsifiable, yes.
But it’s pretty damn bad when the conclusion of the original study can never be confirmed when once in a rare min they try.
I am not saying we should be happy about the results.
I am saying we should be happy that the scientific method is working.
These studies didn’t try to find theories, they tried to find results.
In your example, it’s the same as someone publishing a paper that disproves Relativity - only for us to find that the author fabricated the data.