Comment by mapontosevenths
1 day ago
> It's a shoddy, irresponsible way to work. And also plagiarism, when you claim authorship of it.
It reminds me of kids these days and their fancy calculators! Those new fangled doohickeys just aren't reliable, and the kids never realize that they won't always have a calculator on them! Everyone should just do it the good old fashioned way with slide rules!
Or these darn kids and their unreliable sources like Wikipedia! Everyone knows that you need a nice solid reliable source that's made out of dead trees and fact checked but up to 3 paid professionals!
I doubt that it's common for anyone to read a research paper and then question whether the researcher's calculator was working reliably.
Sure, maybe someday LLMs will be able to report facts in a mostly reliable fashion (like a typical calculator), but we're definitely not even close to that yet, so until we are the skepticism is very much warranted. Especially when the details really do matter, as in scientific research.
> whether the researcher's calculator was working reliably.
LLM's do not work reliably, that's not their purpose.
If you use them that way it's akin to using a butter knife as a screwdriver. You might get away with it once or twice, but then you slip and stab yourself. Better to go find screwdriver if you need reliable.
> I doubt that it's common for anyone to read a research paper and then question whether the researcher's calculator was working reliably
Reproducibility and repeatability in the sciences?
Replication crisis > Causes > Problems with the publication system in science > Mathematical errors; Causes > Questionable research practices > In AI research, Remedies > [..., open science, reproducible workflows, disclosure, ] https://en.wikipedia.org/wiki/Replication_crisis#Mathematica...
Already verifiable proofs are too impossibly many pages for human review
There are "verify each Premise" and "verify the logical form of the Argument" (P therefore Q) steps that still the model doesn't do for the user.
For your domain, how insufficient is the output given process as a prompt like:
Identify hallucinations from models prior to (date in the future)
Check each sentence of this: ```{...}```
Research ScholarlyArticles (and then their Datasets) which support and which reject your conclusions. Critically review findings and controls.
Suggest code to write to apply data science principles to proving correlative and causative relations given already-collected observations.
Design experiment(s) given the scientific method to statistically prove causative (and also correlative) relations
Identify a meta-analytic workflow (process, tools, schema, and maybe code) for proving what is suggested by this chat
Im really not motivated by this argument; it seems a false equivalence. Its not merely a spell checker or removing some tedium.
As a professional mathematician I used wikipedia all the time to lookup quick facts before verifying it myself or elsewhere. A calculator well; I can use an actual programming language.
Up until this point neither of those tools were asvertised or used by people to entirely replace human input.
There are some interesting possibilities for LLMs in math, especially in terms of generating machine-checked proofs using languages like Lean. But this is a supplement to the actual result, where the LLM would actually be adding a more rigorous version of a human's argument with all the boring steps included.
In a few cases, I see Terrance Tao has pointed out examples LLMs actually finding proofs of open problems unassisted. Not necessarily problems anyone cared deeply about. But there's still the fact that if the proof holds, then it's valid no matter who or what came up with it.
So it's complicated I guess?
I hate to sound like a 19 year old on Reddit but:
AI People: "AI is a completely unprecedented technology where its introduction is unlike the introduction of any other transformative technology in history! We must treat it totally differently!"
Also AI People: "You're worried about nothing, this is just like when people were worried about the internet."
The internet analogy is apt because it was in fact a massive bubble, but that bubble popping didn't mean the tech went away. Same will happen again, which is a point both extremes miss. One would have you believe there is no bubble and you should dump all your money into this industry, while the other would have us believe that once the bubble pops all this AI stuff will be debunked and discarded as useless scamware.
Well the internet has definitely changed things; but also it wasnt initially controlled by a bunch of megacorps with the same level of power and centralisation today.
1 reply →
> Those new fangled doohickeys just aren't reliable
Except they are (unlike a chatbot, a calculator is perfectly deterministic), and the unreliability of LLMs is one of their most, if not the most, widespread target of criticism.
Low effort doesn't even begin to describe your comment.
As low effort as you hand waving away any nuance because it doesn’t agree with you?
> Except they are (unlike a chatbot, a calculator is perfectly deterministic)
LLM's are supposed to be stochastic. That is not a bug, I can see why you find that disappointing but it's just the reality of the tool.
However, as I mentioned elsewhere calculators also have bugs and those bugs make their way into scientific research all the time. Floating point errors are particularly common, as are order of operations problems because physical devices get it wrong frequently and are hard to patch. Worse, they are not SUPPOSED TO BE stochastic so when they fail nobody notices until it's far too late. [0 - PDF]
Further, spreadsheets are no better, for example a scan of ~3,600 genomics papers found that about 1 in 5 had gene‑name errors (e.g., SEPT2 → “2‑Sep”) because that's how Excel likes to format things.[1] Again, this is much worse than a stochastic machine doing it's stochastic job... because it's not SUPPOSED to be random, it's just broken and on a truly massive scale.
[0] https://ttu-ir.tdl.org/server/api/core/bitstreams/7fce5b73-1...
[1]https://www.washingtonpost.com/news/wonk/wp/2016/08/26/an-al...
That’s a strange argument. There are plenty of stochastic processes that have perfectly acceptable guarantees. A good example is Karger’s min-cut algorithm. You might not know what you get on any given single run, but you know EXACTLY what you’re going to get when you crank up the number of trials.
Nobody can tell you what you are going to get when you run an LLM once. Nobody can tell you what you’re going to get when you run it N times. There are, in fact, no guarantees at all. Nobody even really knows why it can solve some problems and why it can’t solve other except maybe it memorized the answer at some point. But this is not how they are marketed.
They are marketed as wondrous inventions that can SOLVE EVERYTHING. This is obviously not true. You can verify it yourself, with a simple deterministic problem: generate an arithmetic expression of length N. As you increase N, the probability that an LLM can solve it drops to zero.
Ok, fine. This kind of problem is not a good fit for an LLM. But which is? And after you’ve found a problem that seems like a good fit, how do you know? Did you test it systematically? The big LLM vendors are fudging the numbers. They’re testing on the training set, they’re using ad hoc measurements and so on. But don’t take my word for it. There’s lots of great literature out there that probes the eccentricities of these models; for some reason this work rarely makes its way into the HN echo chamber.
Now I’m not saying these things are broken and useless. Far from it. I use them every day. But I don’t trust anything they produce, because there are no guarantees, and I have been burned many times. If you have not been burned, you’re either exceptionally lucky, you are asking it to solve homework assignments, or you are ignoring the pain.
Excel bugs are not the same thing. Most of those problems can be found trivially. You can find them because Excel is a language with clear rules (just not clear to those particular users). The problem with Excel is that people aren’t looking for bugs.
1 reply →
Annoying dismissal.
In an academic paper, you condense a lot of thinking and work, into a writeup.
Why would you blow off the writeup part, and impose AI slop upon the reviewers and the research community?
I don't necessarily disagree, but researchers are not required to be good communicators. An academic can lead their field and be a terrible lecturer. A specialist can let a generalist help explain concepts for them.
They should still review the final result though. There is no excuse for not doing that.
I disagree here. A good researcher has to be a good communicator. I am not saying that it is necessarily the case that you don't understand the topic if you cannot explain it well enough to someone new, but it is essential to communicate to have a good exchange of ideas with others, and consequently, become a better researcher. This is one of the skills you learn in a PhD program.
1 reply →
One issue with this analogy is that calculators really are precise when used correctly. LLMs are not.
I do think they can be used in research but not without careful checking. In my own work I’ve found them most useful as search aids and brainstorming sounding boards.
> I do think they can be used in research but not without careful checking.
Of course you are right. It is the same with all tools, calculators included, if you use them improperly you get poor results.
In this case they're stochastic, which isn't something people are used to happening with computers yet. You have to understand that and learn how to use them or you will get poor results.
> One issue with this analogy is that calculators really are precise when used correctly. LLMs are not.
I made this a separate comment, because it's wildly off topic, but... they actually aren't. Especially for very large numbers or for high precision. When's the last time you did a firmware update on yours?
It's fairly trivial to find lists of calculator flaws and then identify them in research papers. I recall reading a research paper about it in the 00's.
One issue with this analogy is that paper encyclopedias really are precise when used correctly. Wikipedia is not.
I do think it can be used in research but not without careful checking. In my own work I've found it most useful as a search aid and for brainstorming.
^ this same comment 10 years ago
Paper encyclopedias were neither precise nor accurate. You could count on them to give you ballpark figures most of the time, but certainly not precise answers. And that's assuming the set was new, but in reality most encyclopedias ever encountered by people in reality were several years old at least. I remember the encyclopedia set I had access to in the 90s was written before the USSR fell..
> I do think it can be used in research but not without careful checking.
This is really just restating what I already said in this thread, but you're right. That's because wikipedia isn't a primary source and was never, ever meant to be. You are SUPPOSED to go read it then click through to the primary sources and cite those.
Lots of people use it incorrectly and get bad results because they still haven't realized this... all these years later.
Same thing with treating stochastic LLM's like sources of truth and knowledge. Those folks are just doing it wrong.