← Back to context

Comment by jmathai

2 years ago

This seems to be a common view among some folks. Personally, I'm impartial.

Search or even asking other expert human beings are prone to provide incorrect results. I'm unsure where this expectation of 100% absolute correctness comes from. I'm sure there are use cases, but I assume it's the vast minority and most can tolerate larger than expected inaccuracies.

> I'm unsure where this expectation of 100% absolute correctness comes from.

It's a computer. That's why. Change the concept slightly: would you use a calculator if you had to wonder if the answer was correct or maybe it just made it up? Most people feel the same way about any computer based anything. I personally feel these inaccuracies/hallucinations/whatevs are only allowing them to be one rung up from practical jokes. Like I honestly feel the devs are fucking with us.

  • Speech to text is often wrong too. So is autocorrect. And object detection. Computers don't have to be 100% correct in order to be useful, as long as we don't put too much faith in them.

    • Call me old fashioned, but I would absolutely like to see autocorrect turned off in many contexts. I much prefer to read messages with 30% more transparent errors rather than any increase in opaque errors. I can tell what someone meant if I see "elephent in the room", but not "element in the room" (not an actual example, autocorrect would likely get that one right).

    • Your caveat is not the norm though, as everyone is putting a lot of faith in them. So, that's part of the problem. I've talked with people that aren't developers, but they are otherwise smart individuals that have absolutely not considered that the info is not correct. The readers here are a bit too close to the subject, and sometimes I think it is easy to forget that the vast majority of the population do not truly understand what is happening.

      1 reply →

    • People put too much faith in conspiracy theories they find on YT, TikTok, FB, Twitter, etc. What you're claiming is already not the norm. People already put too much faith into all kinds of things.

  • Okay, but search is done on a computer, and like the person you’re replying to said, we accept close enough.

    I don’t necessarily disagree with your interpretation, but there’s a revealed preference thing going on.

    The number of non-tech ppl I’ve heard directly reference ChatGPT now is absolutely shocking.

    • > The number of non-tech ppl I've heard directly reference ChatGPT now is absolutely shocking.

      The problem is that a lot of those people will take ChatGPT output at face value. They are wholly unaware that of its inaccuracies or that it hallucinates. I've seen it too many times in the relatively short amount of time that ChatGPT has been around.

      12 replies →

  • why should all computing be deterministic?

    let me show you this "genius"/"wrong-thinking" person as to say about AL(artificial life) and deterministic computing.

    https://www.cs.unm.edu/~ackley/

    https://www.youtube.com/user/DaveAckley

    To sum up a bunch of their content: You can make intractable problems solvable/crunchable if you allow just a little error into the result (which is reduced the longer the calculation calculates). And this is acceptable for a number of use cases where initial accuracy is less important that instant feedback.

    It is radically different from a Von Neumann model of a computer - where there is a deterministic 'totalitarian finger pointer' pointing to some registry (and only one registry at a time) is an inherently limited factor. In this model - each computational resource (a unit of ram, and a processing unit) fights for and coordinates reality with it's neighbors without any central coordination.

    Really interesting stuff. still in its infancy...

I'm a software engineer, and I more or less stopped asking ChatGPT for stuff that isn't mainstream. It just hallucinates answers and invents config file options or language constructs. Google will maybe not find it, or give you an occasional outdated result, but it rarely happens that it just finds stuff that's flat out wrong (in technology at least).

For mainstream stuff on the other hand ChatGPT is great. And I'm sure that Gemini will be even better.

  • The important thing is that with Web Search as a user you can learn to adapt to varying information quality. I have a higher trust for Wikipedia.org than I do for SEO-R-US.com, and Google gives me these options.

    With a chatbot that's largely impossible, or at least impractical. I don't know where it's getting anything from - maybe it trained on a shitty Reddit post that's 100% wrong, but I have no way to tell.

    There has been some work (see: Bard, Bing) where the LLM attempts to cite its sources, but even then that's of limited use. If I get a paragraph of text as an answer, is the expectation really that I crawl through each substring to determine their individual provenances and trustworthiness?

    The shape of a product matters. Google as a linker introduces the ability to adapt to imperfect information quality, whereas a chatbot does not.

    As an exemplar of this point - I don't trust when Google simply pulls answers from other sites and shows it in-line in the search results. I don't know if I should trust the source! At least there I can find out the source from a single click - with a chatbot that's largely impossible.

  • > it rarely happens that it just finds stuff that's flat out wrong

    "Flat out wrong" implies determinism. For answers which are deterministic such as "syntax checking" and "correctness of code" - this already happens.

    ChatGPT, for example, will write and execute code. If the code has an error or returns the wrong result it will try a different approach. This is in production today (I use the paid version).

  • I use chatgpt4 for very obscure things

    If I ever worried about being quoted then I’ll verify the information

    otherwise I’m conversational, have taken an abstract idea into a concrete one and can build on top of it

    But I’m quickly migrating over to mistral and if that starts going off the rails I get an answer from chatgpt4 instead

I know exactly where the expectation comes from. The whole world has demanded absolute precision from computers for decades.

Of course, I agree that if we want computers to “think on their own“ or otherwise “be more human“ (whatever that means) we should expect a downgrade in correctness, because humans are wrong all the time.

  • > The whole world has demanded absolute precision from computers for decades.

    Computer engineers maybe. I think the general population is quite tolerant of mistakes as long as the general value is high.

    People generally assign very high value to things computers do. To test this hypothesis all you have to do is ask folks to go a few days without their computer or phone.

  • > The whole world has demanded absolute precision from computers

    The opposite. Far too tolerant of the excuse "sorry, computer mistake." (But yeah, just at the same time as "the computer says so".)

  • Is it less reliable than an encyclopedia? It is less reliable than Wikipedia? Those aren't infallible but what's the expectation if it's wrong on something relatively simple?

    With the rush of investment in dollars and to use these in places like healthcare, government, security, etc. there should be absolute precision.

Humans are imperfect, but this comes with some benefits to make up for it.

First, we know they are imperfect. People seem to put more faith into machines, though I do sometimes see people being too trusting of other people.

Second, we have methods for measuring their imperfection. Many people develop ways to tell when someone is answering with false or unjustified confidence, at least in fields they spend significant time in. Talk to a scientist about cutting edge science and you'll get a lot of 'the data shows', 'this indicates', or 'current theories suggest'.

Third, we have methods to handle false information that causes harm. Not always perfect methods, but there are systems of remedies available when experts get things wrong, and these even include some level of judging reasonable errors from unreasonable errors. When a machine gets it wrong, who do we blame?

  • Absolutely! And fourth, we have ways to make sure the same error doesn't happen again; we can edit Wikipedia, or tell the person they were wrong (and stop listening to them if they keep being wrong).

I find it ironic that computer scientists and technologists are frequently uberrationalists to the point of self parody but they get hyped about a technology that is often confidently wrong.

Just like the hype with AI and the billions of dollars going into it. There’s something there but it’s a big fat unknown right now whether any part of the investment will actually pay off - everyone needs it to work to justify any amount of the growth of the tech industry right now. When everyone needs a thing to work, it starts to really lose the fundamentals of being an actual product. I’m not saying it’s not useful, but is it as useful as the valuations and investments need it to be? Time will tell.

>I'm unsure where this expectation of 100% absolute correctness comes from. I'm sure there are use cases, but I assume it's the vast minority and most can tolerate larger than expected inaccuracies.

As others hinted at, there's some bias because it's coming from a computer, but I think it's far more nuanced than that.

I've worked with many experts and professionals through my career ranging across medicine, various types of engineers, scientists, academics, researchers and so on and the pattern I often see is the level of certainty presented that always bothers me and the same is often embedded in LLM responses.

While humans don't typically quantify the certainty of their statements, the best SMEs I've ever worked with make it very clear what level of certainty they have when making professional statements. The SMEs who seem to be more often wrong than not speak in certainty quite often (some of this is due to cultural pressures and expectations surrounding being an "expert").

In this case, I would expect a seasoned scientist to say something in response to the duck question that: "many rubber ducks exist and are designed to float, this one very well might, we'd really need to test it or have far more information about the composition of the duck, the design, the medium we want it in (Water? Mecury? Helium?)" and so on. It's not an exact answer but you understand there's uncertainty there and we need to better clarify our question and the information surrounding that question. The fact is, it's really complex to know if it'll float or not from visual information alone.

It could have an osmimum ball inside that overcomes most the assumed buoyancy the material contains, including the air demonstrated to make it squeak. It's not transparent. You don't know for sure and the easiest way to alleviate uncertainty in this case is simply to test it.

There's so much uncertainty in the world, around what seem like the most certain and obvious things. LLMs seem to have grabbed some of this bad behavior from human language and culture where projecting confidence is often better (for humans) than being correct.

Most people I worked with either tell me "I don't know" or "I think x, but with not sure" when they are not sure about something, the issue with LLMs is they don't have this concept.

The bigger problem is lack of context. When I speak with a person or review search results, I can use what I know about the source to evaluate the information I'm given. People have different areas of expertise and use language and mannerisms to communicate confidence in their knowledge or lack thereof. Websites are created by people (most times) and have a number of contextual clues that we have learned to interpret over the years.

LLMs do none of this. They pose as a confident expert on almost everything, and are just as likely to spit out BS as a true answer. They don't cite their sources, and if you ask for the source sometimes they provide ones that don't contain the information cited or don't even exist. If you hired a researcher and they did that you wouldn't hire them again.

1. Hunans may also never be 100% - but it seems they are more often correct. 2. When AI is wrong it's often not only slighty off, but completely off the rails. 3. Humans often tell you when they are not sure. Even if it's only their tone. AI is always 100% convinced it's correct.

If it’s no better than asking a random person, then where is the hype? I already know lots of people who can give me free, maybe incorrect guesses to my questions.

At least we won’t have to worry about it obtaining god-like powers over our society…

  • > At least we won’t have to worry about it obtaining god-like powers over our society…

    We all know someone who's better at self promotion than at whatever they're supposed to be doing. Those people often get far more power than they should have, or can handle—and ChatGPT is those people distilled.

Let's see, so we exclude law, we exclude medical.. it's certainly not a "vast minority" and the failure cases are nothing at all like search or human experts.

  • Are you suggesting that failure cases are lower when interacting with humans? I don't think that's my experience at all.

    Maybe I've only ever seen terrible doctors but I always cross reference what doctors say with reputable sources like WebMD (which I understand likely contain errors). Sometimes I'll go straight to WebMD.

    This isn't a knock on doctors - they're humans and prone to errors. Lawyers, engineers, product managers, teachers too.

    • You think you ask your legal assistant to find some precedents related to your current case and they will come back with an A4 page full of made up cases that sound vaguely related and convincing but are not real? I don't think you understand the failure case at all.

      7 replies →

Guessing from the last sentence that you are one of those "most" who "can tolerate larger than expected inaccuracies".

How much inaccuraciy would that be ?

Where did you get the 100% number from? It's not in the original comment, it's not in a lot of similar criticisms of the models.

Honestly I agree. Humans make errors all the time. Perfection is not necessary and requiring perfection blocks deployment of systems that represent a substantial improvement over the status quo despite their imperfections.

The problem is a matter of degree. These models are substantially less reliable than humans and far below the threshold of acceptability in most tasks.

Also, it seems to me that AI can and will surpass the reliability of humans by a lot. Probably not by simply scaling up further or by clever prompting, although those will help, but by new architectures and training techniques. Gemini represents no progress in that direction as far as I can see.

There's a huge difference between demonstrating something with fuzzy accuracy and playing something off as if it's giving good, correct answers. An honest way to handle that would be to highlight where the bot got it wrong instead of running with the answer as if it was right.

Deception isn't always outright lying. This video was deceitful in form and content and presentation. Their product can't do what they're implying it can, and it was put together specifically to mislead people into thinking it was comparable in capabilities to gpt-4v and other competitor's tech.

Working for Google AI has to be infuriating. They're doing some of the most cutting edge research with some of the best and brightest minds in the field, but their shitty middle management and marketing people are doing things that undermine their credibility and make them look like untrustworthy fools. They're a year or more behind OpenAI and Anthropic, barely competitive with Meta, and they've spent billions of dollars more than any other two companies, with a trashcan fire for a tech demo.

It remains to be seen whether they can even outperform Mistral 7b or some of the smaller open source models, or if their benchmark numbers are all marketing hype.

If a human expert gave wrong answers as often and as confidently as LLMs, most would consider no longer asking them. Yet people keep coming back to the same LLM despite the wrong answers to ask again in a different way (try that with a human).

This insistence on comparing machines to humans to excuse the machine is as tiring as it is fallacious.