← Back to context

Comment by philovivero

2 years ago

> Both of those are systems which had to work right. Large language models are not even close to being safe to use in such applications. Until LLMs report "don't know" instead of hallucinating, they're limited to very low risk applications such as advertising and search.

Are humans limited to low-risk applications like that?

Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.

I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.

And I don't want to count the number of times I've personally done that, but I'm sure it's >0. And I hate to tell you, but I've spent the last 20 years in positions of authority that could have caused massive amounts of damage not only to the companies I've been employed by, but a large cross-section of society as well. And those fools I referenced in the last paragraph? Same.

I think people are too hasty to discount LLMs, or LLM-backed agents, or other LLM-based applications because of their limitations.

(Related: I think people are too hasty to discount the catastrophic potential of self-modifying AGI as well)

Can people please stop making this comment in reply to EVERY criticism of LLMs? "Humans are flawed too".

We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.

You remember a few isolated incidents because they're salient. That does not mean that it's representative of your average personal interactions.

  • >We do not normally hallucinate.

    Oh yes we do lol. Many experiments show our perception of reality and of cognition is entirely divorced from the reality of what's really going on.

    Your brain is making stuff up all the time. Sense data you perceive is partly fabricated. Your memories are partly fabricated. Your decision rationales are post hoc rationalizations more often than not. That is, you don't genuinely know why you make certain decisions or what preferences actually inform them. You just think you do. You can't recreate previous mental states. You are not usually aware. But it is happening.

    LLMs are just undoubtedly worse right now.

    • We don’t hallucinate in such a way / to the extend that it compromises our ability to do our job.

      Currently no one will trust a LLM to even run a helpline - that just a lawsuit waiting to happen should the AI hallucinate a “solution” that results in loss of property, injury or death.

      2 replies →

  • > We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.

    In my average interaction with GPT 4 there are far less errors than in this paragraph. I would say that here you in fact "spout fully confidence nonsense" (sic).

    Some humans are better than others at saying things that are correct, and at saying things with appropriately calibrated confidence. Some LLMs are better than some humans in some situations at doing these things.

    You seem to be hung up on the word "hallucinate". It is, indeed, not a great word and many researchers are a bit annoyed that's the term that's stuck. It simply means for an LLM to state something that's incorrect as if it's true.

    The times that LLMs do this do stand out, because "You remember a few isolated incidents because they're salient".

    • > Some humans are better than others at saying things that are correct, and at saying things with appropriately calibrated confidence.

      That's true - which is why we have constructed a society with endless selection processes. Starting from kindergarten, we are constantly assessing people's abilities - so that by the time someone is interviewing for a safety critical job they've been through a huge number of gates.

> Are humans limited to low-risk applications like that?

No, but arguably civilization consists of mechanisms to manage human fallibility (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc). We might not fully understand why, but we've found methods that sorta kinda "work".

> could have caused

That's why they didn't.

  • > No, but arguably civilization consists of mechanisms to manage human fallibility

    Exactly. Civilization is, arguably, one big exercise in reducing variance in individuals, as low variance and high predictability is what lets us work together and trust each other, instead of seeing each other as threats and hiding from each other (or trying to preemptively attack). The more something or someone is unpredictable, the more we see it or them as a threat.

    > (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc).

    And on the more individual scale: culture, social customs and public school system are all forces that shape humans from the youngest age, reducing variance in thoughts and behaviors. Exams of all kind, including psychological ones, prevent high-variance individuals from being able to do large amount of harm to others. The higher the danger, the higher the bar.

    There are tests you need to pass to be able to own and drive a car. There are tests you may need to pass to own a firearm. There are more tests still before you'll be allowed to fly an aircraft. Those tests are not there just to ensure your skills - they also filter high-variance individuals, people who cannot be safely given responsibility to operate dangerous tools.

    Further still, the society has mechanisms to eliminate high-variance outliers. Lighter cases may get some kind of medical or spiritual treatment, and (with gates in place to keep them away from guns and planes) it works out OK. More difficult cases eventually get locked up in prisons or mental hospitals. While there are lot of specific things to discuss about the prison and mental care systems, their general, high-level function is simple: they keep both predictably dangerous and high-variance (i.e. unpredictably dangerous) people stashed safely away, where they can't disrupt or harm others at scale.

    > We might not fully understand why, but we've found methods that sorta kinda "work".

    Yes, we've found many such methods at every level - individual, familial, tribal, national - and we stack them all on top of each other. This creates the conditions that let us live in larger groups, with less conflicts, as well as to safely use increasingly powerful (i.e. potentially destructive) technologies.

    • I think you’re weighting the contribution of authority a bit too highly. The bad actors to be concerned about are a very small percentage of the population and we do need institutions with authority to keep those people at bay but it’s not like there’s this huge pool of “high variance” people that need to be screened out. The vast majority of people are extremely close in both opinion and ability, any semblance of society would be impossible otherwise.

      1 reply →

>Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.

> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.

Spouting out the most ignorant stuff is one of the lowest risk things you can do in general. We're talking about running a code where bug can do a ton of damage, financial or otherwise, not water-cooler conversations.

In the train example, the UI is in place to prevent a person from making a dangerous route. I think the idea here is that an LLM cannot take the place of such a UI as they are inherently unreliable.

To your point,Humans are augmented by checklists and custom processes in critical situations. And very certainly applications include which mimic such safety checklists. We don't NEED to start from LLM perspective of our goal is different and doesn't benefit from LLM. Not all UI or architecture is fit for all purposes.

Couldn’t you make this same argument with a chat bot that wasn’t an LLM at all?

“Yes, it may have responded with total nonsense just now, but who among us can say they’ve never done the same in conversation?”

> Are humans limited to low-risk applications like that?

Yes, of course. That's why the systems the parent mentioned designed humans out of the safety-critical loop.

> Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.

> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.

The key difference is that when the human you're having a conversation with states something, you're able to ascertain the likelihood of it being true based on available context: How well do you know them? How knowledgeable are they about the subject matter? Does their body language indicate uncertainty? Have they historically been a reliable source of information?

No such introspection is possible with LLMs. Any part of anything they say could be wrong and to any degree!

I wholeheartedly agree with the main thrust of your comment. Care to expand on your (related: potential catastrophe) opinion?