Comment by mojomark
2 years ago
"LLMs are basically designed so that wrong answers look as close as possible to right answers"
I work in the robotics field and we've had a strong debate going since ChatGPT launched. Every debate ends "so, how can you trust it." Trust is at the heart of all machine learning models - some (e.g. decision trees) yield answers that are more interrogable to humans than others (e.g. neural nets). If what you say is a problem, then maybe the solution is either a.) don't do that (i.e. don't design the system to 'always look right'), or b.) add simple disclamer (like we use on signs near urinals to tell people 'don't eat the blue mints').
I use ChatGPT every day now. I use it (and trust it) like (and as much as) one of my human colleagues. I start with an assumption, I ask and I get a response, and then I judge the response based on the variance from expectation. Too high, and I either re-ask or I do deep research to find out why my assumption was so wrong - which is valuable. Very small, and I may ask it again to confirm, or depending on the magnitude of consequences of the decision, I may just assume it's right.
Bottom line, these engines, like any human, don't need to be 100% trustworthy. To me, this new class of models just need to save me time and make me more effective at my job... and they are doing that. They need to be trustworthy enough. What that means is subjective to the user, and that's OK.
I mostly agree with you - I find LLMs to be very useful in my work, even when I need to verify the output.
But two things I'd highlight:
1. You say you work "in the robotics field", so I'm guessing you work mainly amongst scientists and engineers, i.e. the people who are most specifically trained how to evaluate data.
2. LLMs are not being marketed as this kind of "useful tool but where you need to separately verify the output". Heck, it feels like half the AI (cultish, IMO) community is crowing about how these LLMs are just a step away from AGI.
Point being, I can still find LLMs to be a very useful tool for me personally while still thinking they are being vastly (and dangerously) over hyped.
> a.) don't do that (i.e. don't design the system to 'always look right'),
How would that work? I was naively under the impression that that's very approximately just how LLMs work.
> b.) add simple disclamer (like we use on signs near urinals to tell people 'don't eat the blue mints').
Gemini does stick a disclaimer at the bottom. I think including that is good, but wholly inadequate in that people will ignore it, by genuinely not seeing the disclaimer, forgetting about it, and brushing it off as overly-careful legalese that doesn't actually matter (LLM responses are known to the state of California to cause cancer).
This disclaimer is below each and every chat application. It's about as useful as signs to wash your hands after toilet use. Either you care about it or you don't and that sign doesn't change that.
No one expects a computer to be wrong in that way though.
Why would a computer connected to the sum total of all human knowledge just make up answers when it can find the correct answer?
I guarantee no one is going around saying, “Do not trust the output and always check the results”.
good assumption + confirming response = ok
good assumption + negative response = ok (research)
bad assumption + negative response = ok (research)
bad assumption + confirming response = uh oh
I also use LLMs every day, but you must be very self-aware well using them otherwise they can waste a lot of your time.
"bad assumption + confirming response = uh oh"
100%, but as a rational being, my action on that response depends on the severity of the consequences of a wrong decision.
Hyperbolic example: I think I have cancer and suspect I might die in a year. I go to the doctor, she says "yes, you have cancer and are going to die in 6 months."
What do I do? I, personally, go get a second opinion. Even upon hearing a second time that I will die soon, when faced with death, I'm probably going to spend a little time doing my own research to see if there isn't some new trial out there for treating my condition.
On the other hand, if I ask a friend if the green apple lollipop they're eating tastes good and he responds it's one of the best flavors he's ever experienced, I'm probably going to give it a whirl, because the worst case outcome is just a sour face.