Comment by nairboon
17 hours ago
Nowadays high citation numbers don't mean anymore what they used to. I've seen too many highly cited papers with issues that keep getting referenced, probably because people don't really read the sources anymore and just copy-paste the citations.
On my side-project todo list, I have an idea for a scientific service that overlays a "trust" network over the citation graph. Papers that uncritically cite other work that contains well-known issues should get tagged as "potentially tainted". Authors and institutions that accumulate too many of such sketchy works should be labeled equally. Over time this would provide an additional useful signal vs. just raw citation numbers. You could also look for citation rings and tag them. I think that could be quite useful but requires a bit of work.
I explored this question a bit a few years ago when GPT-3 was brand new. It's tempting to look for technological solutions to social problems. It was COVID so public health papers were the focus.
The idea failed a simple sanity check: just going to Google Scholar, doing a generic search and reading randomly selected papers from within the past 15 years or so. It turned out most of them were bogus in some obvious way. A lot of ideas for science reform take as axiomatic that the bad stuff is rare and just needs to be filtered out. Once you engage with some field's literatures in a systematic way, it becomes clear that it's more like searching for diamonds in the rough than filtering out occasional corruption.
But at that point you wonder, why bother? There is no alchemical algorithm that can convert intellectual lead into gold. If a field is 90% bogus then it just shouldn't be engaged with at all.
There is in fact a method, and it got us quite far until we abandoned it for the peer review plus publish or perish death spiral in the mid 1900s. It's quite simple:
1) Anyone publishes anything they want, whenever they want, as much or as little as the want. Publishing does not say anything about your quality as a researcher, since anyone can do it.
2) Being published doesn't mean it's right, or even credible. No one is filtering the stream, so there's no cachet to being published.
We then let memetic evolution run its course. This is the system that got us Newton, Einstein, Darwin, Mendeleev, Euler, etc. It works, but it's slow, sometimes ugly to watch, and hard to game so some people would much rather use the "Approved by A Council of Peers" nonsense we're presently mired in.
I think that the solution is very simple, remove the citation metric. Citations don't mean correctness. What we want is correctness.
Interesting idea. How do you distinguish between critical and uncritical citation? It’s also a little thorny—if your related work section is just describing published work (which is a common form of reviewer-proofing), is that a critical or uncritical citation? It seems a little harsh to ding a paper for that.
That's one of the issues that causes a bit of work. Citations would need to be judged with context. Let's say paper X is nowadays known to be tainted. If a tainted work is cited just for completeness, it's not an issue, e.g. "the method has been used in [a,b,c,d,x]" If the tainted work is cited critically, even better: e.g. "X claimed to show that..., but y and z could not replicate the results". But if it is just taken for granted at face value, then the taint-label should propagate: e.g. ".. has been previously proved by x and thus our results are very important...".
"Uncritically" might be the wrong criteria, but you should definitely understand the related work you are citing to a decent extent.
Going to conferences seeing researchers who've built a career doing subpar (sometimes blatantly 'fake') work has made me grow increasingly wary of experts. Worst is lots of people just seem to go along with it.
Still I'm skeptical about any sort of system trying to figure out 'trust'. There's too much on the line for researchers/students/... to the point where anything will eventually be gamed. Just too many people trying to get into the system (and getting in is the most important part).
The worse system is already getting gamed. There's already too much on the line for researchers/students, so they don't admit any wrong doing or retract anything. What's the worse that could happen by adding a layer of trust in the h-index ?
I think it could end up helping a bit in the short term. But in the end an even more complicated system (even if in principle better) will reward those spending time gaming it even more.
The system ends up promoting an even more conservative culture. What might start great will end up with groups and institutions being even more protective of 'their truths' to avoid getting tainted.
Don't think there's any system which can avoid these sort of things, people were talking about this before WW1, globalisation just put it in overdrive.
Those citation rings are becoming rampant in my country, along with the author count inflation.
Maybe there should be a different way to calculate h-index. Where for an h-index of n, you also need n replications.