Comment by Animats
2 years ago
This article isn't too helpful.
There have been many "UI Paradigms", but the fancier ones tended to be special purpose. The first one worthy of the name was for train dispatching. That was General Railway Signal's NX (eNtry-Exit) system.[1] Introduced in 1936, still in use in the New York subways. With NX, the dispatcher routing an approaching train selected the "entry" track on which the train was approaching. The system would then light up all possible "exit" tracks from the junction. This took into account conflicting routes already set up and trains present in the junction. Only reachable exits lit up. The dispatcher pushed the button for the desired exit. The route setup was then automatic. Switches moved and locked into position, then signals along the route went to clear. All this was fully interlocked; the operator could not request anything unsafe.
There were control panels before this, but this was the first system where the UI did more than just show status. It actively advised and helped the operator. The operator set the goal; the system worked out how to achieve it.
Another one I encountered was an early computerized fire department dispatching system. Big custom display boards and keyboards. When an alarm came in, it was routed to a dispatcher. Based on location, the system picked the initial resources (trucks, engines, chiefs, and special equipment) to be dispatched. Each dispatcher had a custom keyboard, with one button for each of those resources. The buttons lit up indicating the selected equipment. The dispatcher could add additional equipment with a single button push, if the situation being called in required it. Then they pushed one big button, which set off alarms in fire stations, printed a message on a printer near the fire trucks, and even opened the doors at the fire house. There was a big board at the front of the room which showed the status of everything as colored squares. The fire department people said this cut about 30 seconds off a dispatch, which, in that business, is considered a big win.
Both of those are systems which had to work right. Large language models are not even close to being safe to use in such applications. Until LLMs report "don't know" instead of hallucinating, they're limited to very low risk applications such as advertising and search.
Now, the promising feature of LLMs in this direction is the ability to use the context of previous questions and answers. It's still query/response, but with enough context that the user can gradually make the system converge on a useful result. Such systems are useful for "I don't know what I want but I'll know it when I see it" problems. This allows using flaky LLMs with human assistance to get a useful result.
> Both of those are systems which had to work right. Large language models are not even close to being safe to use in such applications. Until LLMs report "don't know" instead of hallucinating, they're limited to very low risk applications such as advertising and search.
Are humans limited to low-risk applications like that?
Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.
I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.
And I don't want to count the number of times I've personally done that, but I'm sure it's >0. And I hate to tell you, but I've spent the last 20 years in positions of authority that could have caused massive amounts of damage not only to the companies I've been employed by, but a large cross-section of society as well. And those fools I referenced in the last paragraph? Same.
I think people are too hasty to discount LLMs, or LLM-backed agents, or other LLM-based applications because of their limitations.
(Related: I think people are too hasty to discount the catastrophic potential of self-modifying AGI as well)
Can people please stop making this comment in reply to EVERY criticism of LLMs? "Humans are flawed too".
We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.
You remember a few isolated incidents because they're salient. That does not mean that it's representative of your average personal interactions.
>We do not normally hallucinate.
Oh yes we do lol. Many experiments show our perception of reality and of cognition is entirely divorced from the reality of what's really going on.
Your brain is making stuff up all the time. Sense data you perceive is partly fabricated. Your memories are partly fabricated. Your decision rationales are post hoc rationalizations more often than not. That is, you don't genuinely know why you make certain decisions or what preferences actually inform them. You just think you do. You can't recreate previous mental states. You are not usually aware. But it is happening.
LLMs are just undoubtedly worse right now.
3 replies →
> We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.
In my average interaction with GPT 4 there are far less errors than in this paragraph. I would say that here you in fact "spout fully confidence nonsense" (sic).
Some humans are better than others at saying things that are correct, and at saying things with appropriately calibrated confidence. Some LLMs are better than some humans in some situations at doing these things.
You seem to be hung up on the word "hallucinate". It is, indeed, not a great word and many researchers are a bit annoyed that's the term that's stuck. It simply means for an LLM to state something that's incorrect as if it's true.
The times that LLMs do this do stand out, because "You remember a few isolated incidents because they're salient".
1 reply →
The equivalent of hallucinations in LLMs is false memories [1] in people. They happen all the time.
[1] https://en.wikipedia.org/wiki/False_memory
> Are humans limited to low-risk applications like that?
No, but arguably civilization consists of mechanisms to manage human fallibility (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc). We might not fully understand why, but we've found methods that sorta kinda "work".
> could have caused
That's why they didn't.
> No, but arguably civilization consists of mechanisms to manage human fallibility
Exactly. Civilization is, arguably, one big exercise in reducing variance in individuals, as low variance and high predictability is what lets us work together and trust each other, instead of seeing each other as threats and hiding from each other (or trying to preemptively attack). The more something or someone is unpredictable, the more we see it or them as a threat.
> (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc).
And on the more individual scale: culture, social customs and public school system are all forces that shape humans from the youngest age, reducing variance in thoughts and behaviors. Exams of all kind, including psychological ones, prevent high-variance individuals from being able to do large amount of harm to others. The higher the danger, the higher the bar.
There are tests you need to pass to be able to own and drive a car. There are tests you may need to pass to own a firearm. There are more tests still before you'll be allowed to fly an aircraft. Those tests are not there just to ensure your skills - they also filter high-variance individuals, people who cannot be safely given responsibility to operate dangerous tools.
Further still, the society has mechanisms to eliminate high-variance outliers. Lighter cases may get some kind of medical or spiritual treatment, and (with gates in place to keep them away from guns and planes) it works out OK. More difficult cases eventually get locked up in prisons or mental hospitals. While there are lot of specific things to discuss about the prison and mental care systems, their general, high-level function is simple: they keep both predictably dangerous and high-variance (i.e. unpredictably dangerous) people stashed safely away, where they can't disrupt or harm others at scale.
> We might not fully understand why, but we've found methods that sorta kinda "work".
Yes, we've found many such methods at every level - individual, familial, tribal, national - and we stack them all on top of each other. This creates the conditions that let us live in larger groups, with less conflicts, as well as to safely use increasingly powerful (i.e. potentially destructive) technologies.
2 replies →
>Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.
> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.
Spouting out the most ignorant stuff is one of the lowest risk things you can do in general. We're talking about running a code where bug can do a ton of damage, financial or otherwise, not water-cooler conversations.
In the train example, the UI is in place to prevent a person from making a dangerous route. I think the idea here is that an LLM cannot take the place of such a UI as they are inherently unreliable.
To your point,Humans are augmented by checklists and custom processes in critical situations. And very certainly applications include which mimic such safety checklists. We don't NEED to start from LLM perspective of our goal is different and doesn't benefit from LLM. Not all UI or architecture is fit for all purposes.
Couldn’t you make this same argument with a chat bot that wasn’t an LLM at all?
“Yes, it may have responded with total nonsense just now, but who among us can say they’ve never done the same in conversation?”
> Are humans limited to low-risk applications like that?
Yes, of course. That's why the systems the parent mentioned designed humans out of the safety-critical loop.
> Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.
> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.
The key difference is that when the human you're having a conversation with states something, you're able to ascertain the likelihood of it being true based on available context: How well do you know them? How knowledgeable are they about the subject matter? Does their body language indicate uncertainty? Have they historically been a reliable source of information?
No such introspection is possible with LLMs. Any part of anything they say could be wrong and to any degree!
I wholeheartedly agree with the main thrust of your comment. Care to expand on your (related: potential catastrophe) opinion?
When you say train dispatching and control panels, I think you've illustrated how confused this whole discussion is. There should be a separate term called "operator interface" that is separate from "user interface" because UIs have never had any locus of control, because they're for users, and operators are the ones in control. Requesting that an LLM do something is like pressing the button to close the doors of an elevator. Do you feel in charge?
Oh my. This is the first time I've seen this kind of distinction between "users" and "operators" in context of a single system. I kind of always assumed that "operator" is just a synonym for "user" in industries/contexts that are dealing with tools instead of toys.
But this absolutely makes sense, and it is a succinct description for the complaints some of us frequently make about modern UI trends: bad interfaces are the ones that make us feel like "users", where we expect to be "operators".
I’ve seen such a distinction before, but I’ve been around telescopes and particle accelerators. Single system, but different roles in the same system with a different UI.
Oh snap, did I just pull back the curtain?
5 replies →
UIs have never the locus of control, because they're for users, and operators are the ones in control.
Not really any more. The control systems for almost everything complicated now look like ordinary desktop or phone user interfaces. Train dispatching centers, police dispatching centers, and power dispatching centers all look rather similar today.
That's because they're computer users.
I’d love to understand the relevance of this comment, but I sincerely don’t.
You describe two cases that are specially designed to anticipate needs of professionals operating a system. That’s automation, sure, but not AI. The system doesn’t even ostensibly understand yser intent, it’s still simply and obviously deterministic, granted complex.
Do you have an underlying assumption about you wishing tech to only be for solving professional problems?
The context Nielsen comes from is the field of Human-Computer Interaction, which to me is about a more varied usage context than that.
LLMs have flaws, sure.
But how does all this at all relate to the paradigm development the article discusses?
LLMs have flaws but they are exceptionally good at transforming data or outputting data in the format I want.
I once asked ChatGPT to tabulate calories of different food. I then asked it to convert table to CSV. I even asked it to provide SQL insert statement for same table. Now the data might be incorrect but the transformation of that data never was.
This works with complex transforms as well like asking it to create docker compose from docker run or podman run command and vice versa. Occasionally the transform would be wrong but then you realise it was just out of date with newer format which is expected because it's knowledge is limited to 2021
Hallucinations will be tamed, I think. Only a matter of time (~3 to 5 years [0]) given the amount of research going into it?
With that in mind, ambient computing has always threatened to be the next frontier in Human-Computer Interaction. Siri, Google Assistant, Alexa, and G Home predate today's LLM hype. Dare I say, the hype is real.
As a consumer, GPT4 has shown capabilities far beyond whatever preceded it (with the exception of Google Translate). And from what Sam has been saying in the interviews, newer multi-modal GPTs are going to be exponentially better: https://youtube.com/watch?v=H1hdQdcM-H4s&t=380s
[0] https://twitter.com/mustafasuleymn/status/166948190798020608...
> Hallucinations will be tamed, I think.
I don't think that's likely unless there was a latent space of "Truth" which could be discovered through the right model.
That would be a far more revolutionary discovery than anyone can possibly imagine. For starters the last 300+ years of Western Philosophy would be essentially proven unequivocally wrong.
edit: If you're going to downvote this please elaborate. LLMs currently operate by sampling from a latent semantic space and then decoding that back into language. In order for models to know the "truth", there would have to be a latent space of "true statements" that was effectively directly observable. All points along that surface would represent "truth" statements and that would be the most radical human discovery the history of the species.
They may not be a surface directly encoding the "truth" value, but unless we assume that the training data LLMs are trained on are entirely uncorrelated with the truth, there should be a surface that's close enough.
I don't think the assumption that LLM training data is random with respect to truth value is reasonable - people don't write random text for no reason at all. Even if the current training corpus was too noisy for the "truth surface" to become clear - e.g. because it's full of shitposting and people exchanging their misconceptions about things - a better-curated corpus should do the trick.
Also, I don't see how this idea would invalidate the last couple centuries of Western philosophy. The "truth surface", should it exist, would not be following some innate truth property of statements - it would only be reflecting the fact that the statements used in training were positively correlated with truth.
EDIT: And yes, this would be a huge thing - but not because of some fundamental philosophical reasons, but rather because it would be an effective way to pull truths and correlations from aggregated beliefs of large number of people. It's what humans do when they synthesize information, but at a much larger scale, one we can't match mostly because we don't live long enough.
2 replies →
> I don't think that's likely unless there was a latent space of "Truth" which could be discovered through the right model.
For many medium-sized problems, there is. "Operate car accessories" is a good example. So is "book travel".
Verifiability is a much easier concept than Truth. It's sufficient at least 80-90% of the time for an AI to know whether something is reasonably verifiable, rather than whether it is true. Of course, with sufficient amounts of misinformation and disagreement over which sources can be used for verifiability it's a more complicated act in practice.
> Hallucinations will be tamed.
I hope so. But so far, most of the proposals seem to involve bolting something on the outside of the black box of the LLM itself.
If medium-sized language models can be made hallucination-free, we'll see more applications. A base language model that has most of the language but doesn't try to contain all human knowledge, plus a special purpose model for the task at hand, would be very useful if reliable. That's what you need for car controls, customer service, and similar interaction.
> But so far, most of the proposals seem to involve bolting something on the outside of the black box of the LLM itself.
This might be the only way. I maintain that, if we're making analogies to humans, then LLMs best fit as equivalent of one's inner voice - the thing sitting at the border between the conscious and the (un/sub)conscious, which surfaces thoughts in form of language - the "stream of consciousness". The instinctive, gut-feel responses which... you typically don't voice, because they tend to sound right but usually aren't. Much like we do extra processing, conscious or otherwise, to turn that stream of consciousness into something reasonably correct, I feel the future of LLMs is to be a component of a system, surrounded by additional layers that process the LLM's output, or do a back-and-forth with it, until something reasonably certain and free of hallucinations is reached.
Kaparthy explained how LLMs can retrospectively assess their own output and judge if they were wrong.
Source: https://www.youtube.com/watch?v=bZQun8Y4L2A&t=1607s
Those fall under the second category in the article. No different from using a command line application and passing in a set of parameters and receiving an output.
Sometimes a headline is all you need. Often times people won't read past the headline.