AI: First New UI Paradigm in 60 Years?

2 years ago (nngroup.com)

This article isn't too helpful.

There have been many "UI Paradigms", but the fancier ones tended to be special purpose. The first one worthy of the name was for train dispatching. That was General Railway Signal's NX (eNtry-Exit) system.[1] Introduced in 1936, still in use in the New York subways. With NX, the dispatcher routing an approaching train selected the "entry" track on which the train was approaching. The system would then light up all possible "exit" tracks from the junction. This took into account conflicting routes already set up and trains present in the junction. Only reachable exits lit up. The dispatcher pushed the button for the desired exit. The route setup was then automatic. Switches moved and locked into position, then signals along the route went to clear. All this was fully interlocked; the operator could not request anything unsafe.

There were control panels before this, but this was the first system where the UI did more than just show status. It actively advised and helped the operator. The operator set the goal; the system worked out how to achieve it.

Another one I encountered was an early computerized fire department dispatching system. Big custom display boards and keyboards. When an alarm came in, it was routed to a dispatcher. Based on location, the system picked the initial resources (trucks, engines, chiefs, and special equipment) to be dispatched. Each dispatcher had a custom keyboard, with one button for each of those resources. The buttons lit up indicating the selected equipment. The dispatcher could add additional equipment with a single button push, if the situation being called in required it. Then they pushed one big button, which set off alarms in fire stations, printed a message on a printer near the fire trucks, and even opened the doors at the fire house. There was a big board at the front of the room which showed the status of everything as colored squares. The fire department people said this cut about 30 seconds off a dispatch, which, in that business, is considered a big win.

Both of those are systems which had to work right. Large language models are not even close to being safe to use in such applications. Until LLMs report "don't know" instead of hallucinating, they're limited to very low risk applications such as advertising and search.

Now, the promising feature of LLMs in this direction is the ability to use the context of previous questions and answers. It's still query/response, but with enough context that the user can gradually make the system converge on a useful result. Such systems are useful for "I don't know what I want but I'll know it when I see it" problems. This allows using flaky LLMs with human assistance to get a useful result.

[1] https://online.anyflip.com/lbes/vczg/mobile/#p=1

  • > Both of those are systems which had to work right. Large language models are not even close to being safe to use in such applications. Until LLMs report "don't know" instead of hallucinating, they're limited to very low risk applications such as advertising and search.

    Are humans limited to low-risk applications like that?

    Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.

    I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.

    And I don't want to count the number of times I've personally done that, but I'm sure it's >0. And I hate to tell you, but I've spent the last 20 years in positions of authority that could have caused massive amounts of damage not only to the companies I've been employed by, but a large cross-section of society as well. And those fools I referenced in the last paragraph? Same.

    I think people are too hasty to discount LLMs, or LLM-backed agents, or other LLM-based applications because of their limitations.

    (Related: I think people are too hasty to discount the catastrophic potential of self-modifying AGI as well)

    • Can people please stop making this comment in reply to EVERY criticism of LLMs? "Humans are flawed too".

      We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.

      You remember a few isolated incidents because they're salient. That does not mean that it's representative of your average personal interactions.

      8 replies →

    • > Are humans limited to low-risk applications like that?

      No, but arguably civilization consists of mechanisms to manage human fallibility (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc). We might not fully understand why, but we've found methods that sorta kinda "work".

      > could have caused

      That's why they didn't.

      3 replies →

    • >Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.

      > I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.

      Spouting out the most ignorant stuff is one of the lowest risk things you can do in general. We're talking about running a code where bug can do a ton of damage, financial or otherwise, not water-cooler conversations.

    • In the train example, the UI is in place to prevent a person from making a dangerous route. I think the idea here is that an LLM cannot take the place of such a UI as they are inherently unreliable.

    • To your point,Humans are augmented by checklists and custom processes in critical situations. And very certainly applications include which mimic such safety checklists. We don't NEED to start from LLM perspective of our goal is different and doesn't benefit from LLM. Not all UI or architecture is fit for all purposes.

    • Couldn’t you make this same argument with a chat bot that wasn’t an LLM at all?

      “Yes, it may have responded with total nonsense just now, but who among us can say they’ve never done the same in conversation?”

    • > Are humans limited to low-risk applications like that?

      Yes, of course. That's why the systems the parent mentioned designed humans out of the safety-critical loop.

      > Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.

      > I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.

      The key difference is that when the human you're having a conversation with states something, you're able to ascertain the likelihood of it being true based on available context: How well do you know them? How knowledgeable are they about the subject matter? Does their body language indicate uncertainty? Have they historically been a reliable source of information?

      No such introspection is possible with LLMs. Any part of anything they say could be wrong and to any degree!

    • I wholeheartedly agree with the main thrust of your comment. Care to expand on your (related: potential catastrophe) opinion?

  • When you say train dispatching and control panels, I think you've illustrated how confused this whole discussion is. There should be a separate term called "operator interface" that is separate from "user interface" because UIs have never had any locus of control, because they're for users, and operators are the ones in control. Requesting that an LLM do something is like pressing the button to close the doors of an elevator. Do you feel in charge?

    • Oh my. This is the first time I've seen this kind of distinction between "users" and "operators" in context of a single system. I kind of always assumed that "operator" is just a synonym for "user" in industries/contexts that are dealing with tools instead of toys.

      But this absolutely makes sense, and it is a succinct description for the complaints some of us frequently make about modern UI trends: bad interfaces are the ones that make us feel like "users", where we expect to be "operators".

      7 replies →

    • UIs have never the locus of control, because they're for users, and operators are the ones in control.

      Not really any more. The control systems for almost everything complicated now look like ordinary desktop or phone user interfaces. Train dispatching centers, police dispatching centers, and power dispatching centers all look rather similar today.

      1 reply →

  • I’d love to understand the relevance of this comment, but I sincerely don’t.

    You describe two cases that are specially designed to anticipate needs of professionals operating a system. That’s automation, sure, but not AI. The system doesn’t even ostensibly understand yser intent, it’s still simply and obviously deterministic, granted complex.

    Do you have an underlying assumption about you wishing tech to only be for solving professional problems?

    The context Nielsen comes from is the field of Human-Computer Interaction, which to me is about a more varied usage context than that.

    LLMs have flaws, sure.

    But how does all this at all relate to the paradigm development the article discusses?

    • LLMs have flaws but they are exceptionally good at transforming data or outputting data in the format I want.

      I once asked ChatGPT to tabulate calories of different food. I then asked it to convert table to CSV. I even asked it to provide SQL insert statement for same table. Now the data might be incorrect but the transformation of that data never was.

      This works with complex transforms as well like asking it to create docker compose from docker run or podman run command and vice versa. Occasionally the transform would be wrong but then you realise it was just out of date with newer format which is expected because it's knowledge is limited to 2021

  • Hallucinations will be tamed, I think. Only a matter of time (~3 to 5 years [0]) given the amount of research going into it?

    With that in mind, ambient computing has always threatened to be the next frontier in Human-Computer Interaction. Siri, Google Assistant, Alexa, and G Home predate today's LLM hype. Dare I say, the hype is real.

    As a consumer, GPT4 has shown capabilities far beyond whatever preceded it (with the exception of Google Translate). And from what Sam has been saying in the interviews, newer multi-modal GPTs are going to be exponentially better: https://youtube.com/watch?v=H1hdQdcM-H4s&t=380s

    [0] https://twitter.com/mustafasuleymn/status/166948190798020608...

    • > Hallucinations will be tamed, I think.

      I don't think that's likely unless there was a latent space of "Truth" which could be discovered through the right model.

      That would be a far more revolutionary discovery than anyone can possibly imagine. For starters the last 300+ years of Western Philosophy would be essentially proven unequivocally wrong.

      edit: If you're going to downvote this please elaborate. LLMs currently operate by sampling from a latent semantic space and then decoding that back into language. In order for models to know the "truth", there would have to be a latent space of "true statements" that was effectively directly observable. All points along that surface would represent "truth" statements and that would be the most radical human discovery the history of the species.

      5 replies →

    • > Hallucinations will be tamed.

      I hope so. But so far, most of the proposals seem to involve bolting something on the outside of the black box of the LLM itself.

      If medium-sized language models can be made hallucination-free, we'll see more applications. A base language model that has most of the language but doesn't try to contain all human knowledge, plus a special purpose model for the task at hand, would be very useful if reliable. That's what you need for car controls, customer service, and similar interaction.

      2 replies →

  • Those fall under the second category in the article. No different from using a command line application and passing in a set of parameters and receiving an output.

> With this new UI paradigm, represented by current generative AI, the user tells the computer the desired result but does not specify how this outcome should be accomplished.

This doesn't seem like a whole new paradigm, we already do that. When I hit the "add comment" button below, I'm not specifically instructing the web server how I want my comment inserted into a database (if it even is a database at all.) This is just another abstraction on top of an already very tall layer of abstractions. Whether it's AI under the hood, or a million monkeys with a million typewriters, it doesn't change my interaction at all.

  • I think the important part from the article that establishes the difference is this:

    > As I mentioned, in command-based interactions, the user issues commands to the computer one at a time, gradually producing the desired result (if the design has sufficient usability to allow people to understand what commands to issue at each step). The computer is fully obedient and does exactly what it’s told. The downside is that low usability often causes users to issue commands that do something different than what the users really want.

    Let's say you're creating a new picture from nothing in Photoshop. You will have to build up your image layer by layer, piece by piece, command by command. Generative AI does the same in one stroke.

    Something similar holds for your comment: you had to navigate your browser (or app) to the comment section of this article, enter your comment, and click "add comment". With an AI system with good usability you could presumably enter "write the following comment under this article on HN: ...", and have your comment be posted.

    The difference lies on the axis of "power of individual commands".

    • With a proper AI system you don’t even need to specify the exact article and nature of the comment.

      For example here’s the prompt I use to generate all my HN comments:

      “The purpose of this task is to subtly promote my professional brand and gain karma points on Hacker News. Based on what you know about my personal history and my obsessions and limitations, write comments on all HN front page articles where you believe upvotes can be maximized. Make sure to insert enough factual errors and awkward personal details to maintain plausibility. Report back when you’ve reached 50k karma.”

      Working fine on GPT-5 so far. My… I mean, its 8M context window surely helps to keep the comments consistent.

      1 reply →

    • As the parent comment says, it's just another abstraction level. You have chosen a granularity, but even with "going to a website, enter your comment and click add comment" you are abstracting a lot. You are nto caring about connecting to a server, authentication, etc. The final user doesn't care about that at all, it's just telling the software to post a comment.

      Right now the granularity may be "Comment on Hacker News article about UI this and this and that...", and in 100 years someone will say "But that's too complicated. You need to tell the IA which article to comment and what, while my new IA just guess it from reading my mind..."

      1 reply →

  • If I had a spectrum of purely imperative on one side and purely declarative on the other, these new AIs are much closer to the latter than anything that has come before them.

    SQL errors if you don’t write in very specific language. These new AIs will accept anything and give it their best shot.

  • Yeah I would agree with this, the article struggles really classifying the different paradigms, and due to this the conclusion winds up not holding true. We're still relying on "batch processing".

  • Ok, now let's tackle a slightly tricker UI.

    Let's assume someone hasn't used Blender before.

    "Draw me a realistic looking doughnut, with a shiny top and pink sprinkles"

    Vs.

    2 hour video tutorial to tell you what do 50 or so individual steps using the 2nd paradigm UI. Then clicking all the buttons.

    -- Admittedly, the AI approach robs you of understanding of how the sausage (sorry doughnut) is made.

    Rebuttal: Doughnut macro

    Rebuttal Rebuttal: AI can construct things where a macro doesn't yet exist.

    • In the future it’ll likely be that doing it manually will be considered specialty work. This is already the case with much of programming — as you’d bring in a higher level engineer to do something like tear into the source code of SDKs and monkey with them.

      For something as “simple” as a doughnut, this will just improve the learning curve and let you learn some things a bit later, just like today you can jump into beginner JS without knowing any programming fundamentals

      1 reply →

  • The difference is one is an assistant and the other is a tool. Essentially a tool, has one function. The outcome of all inputs is clear, once you learn the tool. An assistant, behaves different in different environment, it anticipates and interprets. It may not be deterministic. It's easier to use but harder (or impossible) to understand.

    For example, the lasso selection in Photoshop is clearly a tool. A "content aware" selection on the other hand is an assistant.

  • Under the new UI paradigm the ad comment button would let you submit something like “I disagree with this, provide a three paragraph argument that cites X and Y refuting this claim” and it would write the text for you.

    • Why bother with the micromanagement? "Computer, waste time commenting on Hacker News for three hours."

<unpopular-opinion>

Bardini's book about Doug Engelbart recaps a conversation between Engelbart and Minsky about the nature of natural language interfaces... that took place in the 1960s.

AI interfaces taking so long has less to do with the technology (I mean... Zork understood my text sentences well enough to get me around a simulated world) and more to do with what people are comfortable with.

Lowey talked about MAYA (Most Advanced Yet Acceptable.) I think it's taken this long for people to be okay with the inherent slowness of AI interfaces. We needed a generation or two of users who traded representational efficiency for easy to learn abstractions. And now we can do it again. You can code up a demo app using various LLMs, but it takes HOURS of back and forth to get to the point it takes me (with experience and boilerplate) minutes to get to. But you don't need to invest in developing the experience.

And I encourage every product manager to build a few apps with AI tools so you'll more easily see what you're paying me for.

</unpopular-opinion>

  • Sure, and not many people are seriously trying to suggest that one should hire an AI instead of a software engineer _at this point_, assuming you have a real budget.

    But, especially with GPT-4, it is entirely feasible to create a convenient and relatively fast user experience for building a specific type of application that doesn't stray too far from the norm. AI can call the boilerplate generator and even add some custom code using a particular API that you feed it.

    So many people are trying to build that type of thing (including me). As more of these become available, many people who don't have thousands of dollars to pay a programmer will hire an AI for a few tens or hundreds of dollars instead.

    The other point is that this is the current state of generative AI at the present moment. It gets better every few months.

    Project the current rate of progress forward by 5-10 years. One can imagine that if we are selling something at that point, it's not our own labour. Maybe it would be an AI that we have tuned with skills, knowledge, face, voice, and personality that we think will be saleable. Possibly using some of our own knowledge and skills to improve that recipe. Although there will likely be marketplaces where you can easily select the abilities or characteristics you want.

  • In Jaron Lanier's review of John Markoff's book "What the Dormouse Said", he mentioned an exchange between Douglass Engelbart and Marvin Minsky:

    https://web.archive.org/web/20110312232514/https://www.ameri...

    >Engelbart once told me a story that illustrates the conflict succinctly. He met Marvin Minsky — one of the founders of the field of AI — and Minsky told him how the AI lab would create intelligent machines. Engelbart replied, "You're going to do all that for the machines? What are you going to do for the people?" This conflict between machine- and human-centered design continues to this day.

>And if you’re considering becoming a prompt engineer, don’t count on a long-lasting career.

There's like this whole class of technical jobs that only follow trends. If you were an en vogue blockchain developer, this is your next target if you want to remain trendy. It's hard to care about this happening as the technical debt incurred will be written off -- the company/project isn't ingrained enough in society to care about the long-term quality.

So best of luck, ye prompt engineers. I hope you collect multi-hundred-thousand dollar salaries and retire early.

The article fails to grasp the essence of what UI is actually about. I agree that AI is adding a new layer to UI and UX design. In our work [1] we have seen an increase in AI projects or features the last 12 months (for obvious reasons).

However, the way that AI will contribute to better UI is to remove parts of the Interface. not simply giving it a new form.

Let me explain, the ultimate UI is no UI. In a perfect scenario, you think about something (want pizza) and you have it (eating pizza) as instant as you desire.

Obviously this isn’t possible so the goal of Interface design is to find the least amount of things needed to get you from point A to the desired Destination as quickly as possible.

Now, with AI, you can start to add a level of predictive Interfaces where you can use AI to remove steps that would normally require users to do something.

If you want to design better products with AI, you have to remember that product design is about subtracting things not adding them. AI is a technology that can help with that.

[1] https://fairpixels.pro

  • > the goal of Interface design is to find the least amount of things needed to get you from point A to the desired Destination as quickly as possible.

    That shouldn't be the primary goal of user interfaces, in my opinion. The primary goal should be to allow users to interface with the machine in a way that allows maximal understanding with minimal cognitive load.

    I understand a lot of UI design these days prioritizes the sort of "efficiency" you're talking about, but I think that's one of the reasons why modern UIs tend to be fairly bad.

    Efficiency is important, of course! But (depending on what tool the UI is attached to) it shouldn't be the primary goal.

    • > I understand a lot of UI design these days prioritizes the sort of "efficiency" you're talking about, but I think that's one of the reasons why modern UIs tend to be fairly bad.

      IMO, the main problem is that this "efficiency" usually involves making assumptions that can't be altered, which achieves "efficiency" by eliminating choices normally available to the user. This is rarely done for the benefit of the user - rather, it just reduces the UI dev work, and more importantly, lets the vendor lock-in the option that's beneficial to them.

      In fact, I've been present on UI design discussions for a certain SaaS product, and I quickly realized one of the main goals for that UI was to funnel the users towards a very specific workflow which, to be fair, reduced the potential for users to input wrong data or screw up the calculations, but more importantly, it put them on a very narrow path that was optimized to give results that were impressive, even if this came at the expense of accuracy - and it neatly reduced the amount of total UI and technical work, without making it obvious that the "golden path" is the only path.

      It's one of those products I believe would deliver much greater value to the users if it was released as an Excel spreadsheet. In fact, it was actually competing with an Excel plugin - and all the nice web UI did was making things seem simpler, by dropping almost all useful functionality except that which happened to align with the story the sales folks were telling.

      1 reply →

    • > The primary goal should be to allow users to interface with the machine in a way that allows maximal understanding with minimal cognitive load.

      If you use your phone, is your primary goal to interface with it in a way that allows maximal understanding with minimal cognitive load?

      I’m pretty sure that’s not the case. You go read the news, send a message to a loved one etc. there’s a human need that you’re aiming to fulfill. Interfacing with tech is not the underlying desire. It’s what happens on the surface as a means.

      2 replies →

  • > Let me explain, the ultimate UI is no UI. In a perfect scenario, you think about something (want pizza) and you have it (eating pizza) as instant as you desire.

    That doesn’t solve for discovery. For instance, order the pizza from where? What kinds of pizza are available? I’m kinda in the mood for pizza, but not dead set on it so curious about other cuisines too. Etc.

  • I hate to appeal to authority, but I am fairly sure that Jakob Nielsen grasps the essence of what UI is actually about.

  • It seems rather obvious to me that when Nielsen is talking about AI enabling users to express intent, that naturally lends itself to being able to remove steps that were there only due to the nature of the old UI paradigm. Not sure what new essence you’re proposing? Best UI is no UI is a well known truism in HCI/Human Centered Design.

  • Having no UI sounds horrible. I don’t want every random desire I have to be satisfied immediately. I’d rather have what I need available at the appropriate time and in a reasonable quantity and have the parameters of that system be easily adjusted. So instead of want pizza = have pizza it would be healthy meal I enjoy shows up predictably at the time I should eat and the meal and time are configurable so I can change them when I’m planning my diet.

  • You can't eliminate the UI if you want to be able to do more than one thing (e.g., order a pizza).

    The UI should simply let you easily do what needs to be done.

  • sometimes I wonder if the edges of articulated desire may always be essentially binary / quantitative, meaning that slow yes / nos are in fact the best way for us to grapple with them, and systems that allow us a set of these yes/no buttons are in fact a reflection of ourselves and not a requirement of the machine. So long as we are builders, I think we'll have buttons. even in transhumanist cyberspace perhaps. Still waiting on peer review for that one though

> With the new AI systems, the user no longer tells the computer what to do. Rather, the user tells the computer what outcome they want.

Maybe we can borrow programming paradigm terms here and describe this as Imperative UX versus Declarative UX. Makes me want to dive into SQL or XSLT and try to find more parallels.

  • I was thinking of imperative vs declarative, too.

    SQL is declaritive with a pre-defined syntax and grammar as an interface, where as the AI style of interaction has a natural language interface.

    • SQL and XSLT are declarative, but the outputs are clean and intuitive. The data model and data set are probably well understood, as is the mapping to and from the query.

      AI is a very different type of declarative. It's messy, difficult to intuit, has more dimensionality, and the outputs can be signals rather than tabular data records.

      It rhymes, but it doesn't feel the same.

      2 replies →

Not sure I would lump command line interfaces from circa 1964 with GUIs from 1984 through to the present, all in a single "paradigm". That seems like a stretch.

  • Agreed.

    Also, Uber (and many other mobile apps) wouldn't work as a CLI or desktop GUI, so leaving out mobile is another stretch.

    • That seems like a technology centered view. Nielsen is talking from the field of Human-Computer Interaction where he is pioneer, which deals with the point of view of human cognition. In terms of the logic of UI mechanics, what about mobile is different? Sure gestures and touch UI bring a kind of difference. Still, from the standpoint of cognition, desktop and mobile UIs have fundamentally the same cognitive dynamics. Command line UIs make you remember conmands by heart, GUIs make you select from a selection offered to you but they still do not undestand your intention. AI changes the paradigm as it is ostensibly able to understand intent so there is no deterministic selection of available commands. Instead, the interaction is closer to collaboration.

      1 reply →

    • It’s still action/response you have to tap buttons and make choices based on what you see on the screen. The new paradigm would be to tell Uber that you need a ride later after the party and then it figures out when and where to pick you up and what address you’ll be going to.

What strikes me most powerfully when interacting with the LLMs is that, unlike virtually ever other computer system I've ever used, the bots are extremely forgiving of mistakes, disfluencies, typos, and other errors I make when I'm typing. The bot usually figures out what I mean and tells me what I want to know.

As a demo once, I trained an object detector on some vector art (high quality art, made by a UX designer) that looked like various components of burgers. I also printed the art and mounted it on magnets and used a magnetic dry board; you could put components of a burger on the board, and a real-time NN would classify the various components. I did it mainly as a joke when there was a cheeseburger emoji controversy (people prefer cheese above patty, btw).

But when I was watching I realized you could probably combine this with gesture and pose detection and build a little visual language for communicating with computers. It would be wasteful and probably not very efficient, but it was still curious how much object detection enabled building things in the real world and having it input to the computer easily.

  • What you imagined sounds vaguely like dynamicland from Bret Victor.

    https://dynamicland.org/

    The dots around the paper are encoded programs, and you can use other shapes, objects, or sigils that communicate with the computer vision system.

I would have said ChatGPTs interface is a descendant of Infocomm adventure games which are a descendant of Colossal Cave.

When using ChatGPT it certainly evokes the same feeling.

Maybe this guy never played adventure.

  • Well there's a thought. A zorklike where the game content is whatever generative ML hallucinates (instead of the built-in fixed maps & interactions) -- as long as a second ML system agrees that the answer follows some more general rules.

    For example: Rules say "In the beginning, the Enemy has a diamond. User cannot get the diamond from the Enemy if the Enemy is still alive. The Enemy is a fierce opponent and hard to kill." but nothing about the details of the enemy, shape of the map, or the available tools. Re-generate each response until it succeeds the verification.

    Let the adventure be randomized by the hallucinations, while keeping some basic challenges in place.

    An acid-tripping D&D dungeon master coming up with plot twists, combined with a rulebook-reading lawyer. Bonus points for adding generated "cut scene" visuals every now and then.

    • With the new function calling feature you may not need the second system. Only present options to ChatGPT that are valid. Feed it updated state information as JSON. Have it describe and elaborate on what the game engine is doing, or use functions to invoke entity creation that can then be tracked by the engine.

      So for example the engine can do combat rolls and the LLM can give each a unique description of the type of attack and defense. Each monster or treasure can get its own unique description generated by the LLM that matches the stats given by the LLM.

      2 replies →

    • ChatGPT already does really good adventure games.

      "Let's play an adventure game, you be the DM. I want it set on a spaceship arriving at a planet after 10,000 year journey. It should have a sense of mystery and a slight sense of foreboding and dread. It must have at least 20 locations. The objective of the game is to find 10 colonists in the ship and get them safely to the surface of the planet. Make it play in the style of an Infocomm adventure. Don't tell me all the locations in advance, make discovery part of the adventure."

      2 replies →

    • Sidenote but AI bot companion for D&D session going "you can't do that in rules" would be funny addition.

      It would be interesting experiment to use it to work as NPC characters in one too.

  • I grew up playing Infocomm games and ChatGPT is nothing like an Infocomm game. They only thing they share is that the UI is based on text. Infocomm games were mostly about trying to figure out what command the programmer wanted you to do next. Infocomm games were closer to Dragon's Lair than ChatGPT, although ChatGPT "looks" more similar.

    • Both Infocomm adventures and ChatGPT have a text based interface in which you interact with the software as though you were interacting with a person. You tell the software the outcome you want using natural language and it responds to you in the first person. That is a common UI paradigm.

      example: "get the cat then drop the dog then open the door, go west and climb the ladder" - that is a natural language interface, which is what ChatGPT has. In both the Infocomm and ChatGPT case the software will respond to you in the first person as though you were interacting with someone.

      >> Infocomm games were closer to Dragon's Lair than ChatGPT

      This is a puzzling comment. The UI for Zork has nothing at all to do with Dragon's Lair. In fact Dragon's Lair was possibly the least interactive of almost all computer games - it was essentially an interactive movie with only the most trivial user interaction.

      >> Infocomm games were mostly about trying to figure out what command the programmer wanted you to do next.

      This was not my experience of Infocomm adventures.

      4 replies →

  • I think the interactive-dialogue part is a distraction. I think the "new UI paradigm" is defined by goal-orientation, or "outcome specification". So, instead of giving the computer instructions on how to do something, users describe the end goal, and hope for the best, and then finetune the result either by adjusting their request, or by adding explicit commands.

    So, in that sense, even if Infocomm games cleverly emulated the dialogue part of ChatGPT, I don't think that was the novel part claimed here.

    Think more "Make me an Infocomm-style challenge to solve. Include dragons. Do not include orcs, ogres, or any monster that uses a club."

Well, what counts as a “paradigm”? I can’t see any definition of that. If you’d ask 10 people to divide the history of UI into some number of paradigms, you would for sure get 10 different answers. But hey, why not pick the one that makes for a hyperbolic headline. Made me click.

  • The division does not seem arbitrary to me at all. What about the below is questionable to you?

    From sibling comment [1]:

    Nielsen is talking from the field of Human-Computer Interaction where he is pioneer, which deals with the point of view of human cognition. In terms of the logic of UI mechanics, what about mobile is different? Sure gestures and touch UI bring a kind of difference. Still, from the standpoint of cognition, desktop and mobile UIs have fundamentally the same cognitive dynamics. Command line UIs make you remember conmands by heart, GUIs make you select from a selection offered to you but they still do not undestand your intention. AI changes the paradigm as it is ostensibly able to understand intent so there is no deterministic selection of available commands. Instead, the interaction is closer to collaboration.

    1: https://news.ycombinator.com/item?id=36396244

UI is a high frequency concurrency problem. The “deep rooted usability problems” (like lag, glitches, and clumsiness - general lack of fluency) are due to staffing UI projects with web designers and not concurrency engineers. The fluent conversational AI systems and other movie UIs that folks are imagining up are therefore blocked on the concurrency sub-problem. This is the space we research at Hyperfiddle, we put forth our proposed solution here: https://github.com/hyperfiddle/electric

Is it a new paradigm, or an old paradigm that finally works?

Users have been typing commands into computers for decades, getting responses of varying sophistication with varying degrees of natural language processing. Even the idea of an “AI” chatbot that mimics human writing is decades old.

The new thing is that the NLP now has some depth to it.

I don't really get this. The paradigm has always been there, it has been the technology limitations that have defined the UI so far. Having robots and computers that humans talk to has been a fixture of sci-fi movies. Perhaps the most notable example being 2001: A Space Odyssey which came out 55 years ago.

  • Sure, but it's sort of how actual usable and economical flying cars would be a paradigm change for transport. The idea exists, but it's made up fairy magic with capabilities and limitations based on plot requirements. Once it's actually made real it hardly ever ends up being used the way it was imagined.

    Like for example in 2001, the video call tech. They figured it would be used like a payphone with a cathode ray tube lol. Just as in reality nobody in the right mind would hand over complete control of a trillion dollar spaceship to a probabilistic LLM. The end applications will be completely different and cannot be imagined by those limited by the perspective of their time.

    • I don't recall a single cathode ray tube in 2001: A Space Odyssey. The film is notable for having the first depiction of a tablet computer. They went to considerable effort to show flat-screen displays instead of CRTs.

> in command-based interactions, the user issues commands to the computer one at a time, gradually producing the desired result. The computer is fully obedient and does exactly what it’s told.

> With the new AI systems, the user no longer tells the computer what to do. Rather, the user tells the computer what outcome they want.

I think that's true, and a big part of the AI revolution. Instead of filling endless forms that have subtle controls to guide the user, we could have a simple conversation, like SIRI but that would actually work.

At my current client's, we're working on a big application that has many such forms. Once filled, the forms send the data to a back-end system (SAP). There's a team trying to train an LLM so that it can answer questions about the app and about how to fill the forms.

But I think the whole point of AI, as regards to this app, is to eventually replace it entirely. Just let end users ask questions and tell the machine what they want, and the machine can build the proper data and send it to SAP.

I don't think AI is a threat for back-end systems like SAP, at least not yet. But for front-end work, it's obvious that it would be infinitely more pleasant -- and possibly, more efficient -- to tell the machine what to do rather than filling forms.

Two recent events suggest to me that this type of analytical look on interaction modes is commonly underappreciated in the industry. I write this partially from the perspective of a disillusioned student of interaction design.

1. Recent news of vehicle manufacturers moving away from touchscreens

2. Chatbot gold rush of 2018 where most business were sold chatbots under the guise of cost-saving

(edit: formatting)

  • I'm not sure I understand point 1 here. Do you mean that vehicle manufacturers moving away from touchscreens is bad or that they would never have moved to them in the first place if they had properly investigated the idea?

    • The latter - had they given proper thought to the consequences of moving into touch-screens they would've never gone there. Obviously I'm generalizing and discarding the impact of novelty on sales and marketing.

      2 replies →

Chat UI/UX is a tool for experts. To drive this point home consider a user prompting "produce a founder agreement document", for which the AI will happily produce -something-. Even thought he user is able to read the document he does not understand the contents in "legal" terms. In contrast if the user would go to an expert lawyer, who would start by asking the user some relevant questions (=domain questions) and put together a prompt tailored to the needs and circumstances of the user in the domain language, with all relevant nuances..

I am working to create this experience by augmenting the AI interaction with step-by-step leading questions and interaction UI, similar to how users would interact with a domain expert.

https://pth.ai Would love feedback! :)

  • Interesting, I like this sort of wizard approach. Would you share any implementation details? Langchain, presumably?

    • Actually, I'm not using Langchain.

      I'm now adding "agent" functionality, specifically to enable the AI to do some "research" on the web, at the moment this will also be done without a framework.

      So I'm either missing something or I am doing something simple enough that does not require the framework overhead / added value..

      3 replies →

Interesting to bundle both cli/gui under the "command" based interaction paradigm. I've never heard it described that way but it does make sense intuitively. Is that a common perception? I think of the development of the mouse/gui as a very significant event in the history of computing interfaces.

  • When you zoom out on the time scale it makes more sense. I think he's got a point. Both CLIs and GUIs are "command based". LLM prompts are more declarative. You describe what you want.

    • Well LLMs are also “command-based”. They are called prompts. In fact they’d just continue the text but were specifically trained by RLHF to be command-following.

      Actually, we can make automomous agents and agentic behavior without LLMs very well, for decades. And we can program them with declarative instructions much more precisely than with natural language.

      The thing LLMs seem to do is just give non-experts a lot of the tools to get some basic things done that only experts could do for now. This has to do with the LLM modeling the domain space and reading what experts have said thus far, and allowing a non-expert to kind of handwave and produce results.

      1 reply →

What about voice assistants? These are not as impressive when compared to LLMs, so perhaps wouldn't cause a UX shift on their own. But in essence Siri, Alexa, etc also seem to put the user's intent first.

  • I'd argue that voice assistants are somewhat part of the same paradigm[1], and ChatGPT etc focused on pure text input mostly to make the research easier. Voice assistants just focused on the challenges of understanding speech, facilitated by limited allowed grammar, while ChatGPT-style research focused on the challenges of understanding language, facilitated by limiting input to text. "Just" produce ChatGPT input tokens from a voice-to-text-with-extra-hints machine and you have them combined.

    [1] Yes, voice assistants tend to be more command-oriented, but I view that as a limitation of the technology when they were popularized, not as an inherent part of the concept of a voice assistant. Voice is just an input modalism.

  • A voice assistant is simply a speech-driven conversational UI; they belong to the same class of UIs as chatGPT. In fact, you could very well power your voice assistant with GPT.

I really wouldn’t call GUIs a “command-based paradigm”. Feels much more like they’re digital analogues of tools and objects. Your mouse is a tool, and you use it to interface with objects and things, and through special software it can become a more specialized tool (word processors, spreadsheets, graphic design software, etc). You aren’t issuing commands, you’re manipulating a digital environment with tools.

Which is why the notion of conversational AI (or whatever dumb name they came up with for the “third paradigm”) seems kind of alien to me. I mean, I definitely see its utility, but I find it hard to imagine it being as dominant as some are arguing it could be. Any task that involves browsing for information seems like more of an object manipulation task. Any task involving some kind of visual design seems like a tool manipulation task, unless you aren’t too picky about the final result.

Ultimately I think conversational UI is best suited not for tasks, but services. Granted, the line between the two can be fuzzy at times. If you’re looking for a website, but you don’t personally know anything about making a website, then that task morphs into a service that someone or something else does.

Which I suppose is kind of the other reason why I find the idea kind of alien. I almost never use the computer for services. I use it to browse, to create, to work, all of which entail something more intuitively suited to object or tool manipulation.

  • AutoCAD and Rhino 3D are two examples that I remember having a command prompt sitting proudly somewhere at the bottom of the UI. Your mouse clicks and keyboard shortcuts were all converted into commands in text form. If you look at your command history, it's a script - a bit boring since it is completely linear, but add loops, conditionals and function/macro support and you get a very capable scripting environment.

    • AutoCAD definitely was CLI-based, with menus and dialogs basically filling in parameters to the commands. But in the late 90s or so Autodesk got religion and decided that AutoCAD should be a Windows product and follow Microsoft UI guidelines, so I don't know how well they stuck with the "command line underneath" over the years.

      Early in AutoCAD's history, Autodesk did add loops and conditionals to its CLI -- with Lisp! Type an open paren and the command line became a REPL. You could define new commands, directly manipulate entity data structures, and have all the control structures Lisp affords -- not Common Lisp, it was way simpler, but it was powerful.

      To this day, wayward mech engineers still sometimes ask Autolisp-related questions on unrelated Lisp fora, such as r/lisp.

      1 reply →

There's something ironic to me about the fact that building AI experiences still requires the first computing paradigm: batch processing. At least, my experience building a retrieval-augmented generation system requires a lot of batch processing.

Well, I shouldn't say "requires". I'm sure you can build them without batch processing. But batch processing definitely felt like the most natural and straightforward way to do it in my experience.

  • He's talking about human-computer interaction paradigms, not computing paradigms. He's not a general computing expert, he's a UI/UX expert.

    "Batch computing" in this context refers to the era of punch cards, needing to wait for results overnight, and the difficulty of editing pre-existing programs -- and how all of that utterly dictated the style of interaction one had with computers.

    • Yep, I was aware of the difference before I made my original comment. There's still something ironic and interesting about it to me. Can't quite put my finger on it, though.

I'd argue that multi-touch gestural mobile phone and tablet interfaces were different enough from mouse and keyboard to be considered a new paradigm.

  • I'd have multi-touch be a sidebar in the textbook, but not a new section. Gestural interaction is not fundamentally different than a pointer device: it doesn't allow meaningful new behavior. It is sometimes a more intuitive way to afford the same behavior, though. I would agree that portable devices amount to a new paradigm in something—maybe UX—but not UI per se.

GPT based UIs inspired by the idea that if you get the right sequence of prompts you’ll get stochastically acceptable results.

So now I’m imagining the horror predictions for Word where 90% of the screen was button bars. But the twist is that you type in some text and then click on “prompt” buttons repeatedly hoping to get the document formatting you wanted, probably settling for something that was “close enough” with a shrug.

> Summary: AI is introducing the third user-interface paradigm in computing history, shifting to a new interaction mechanism where users tell the computer what they want, not how to do it — thus reversing the locus of control.

Like every query language ever.

I'm not sure the distinction between things we are searching for and things we're actively making is as different as the author thinks.

  • But this is basically the absence of a query syntax, a way to query via natural language, and not just get back a list of results, but have it almost synthesize an answer.

    To everyone who isn't a software developer, this is a new paradigm with computers. Hell even for me as a software dev it's pretty different.

    Like I'm not asking Google to find me info that I can then read and grok, I'm asking something like ChatGPT for an answer directly.

    It's the difference between querying for "documentation for eslint" but instead asking "how do you configure eslint errors to warnings" or even "convert eslint errors to warnings for me in this file".

    It's a very different mental approach to many problems for me.

    • For years I've just typed questions, in English, into browser search bars. It works great. Maybe that is why it doesn't seem like a new paradigm to me.

      2 replies →

  • In your view, then, is AI best described as an incremental improvement over (say) SQL in terms of the tasks it enables users to complete?

    • Incremental improvement over Google search. And, it's not about the tasks that it enables users to complete, it is about the UI paradigm as per the article.

      1 reply →

  • Or constraint-based programming, where some specification is given for the end result and the comouter figure out how to make it happen. But that's usually a programming thing, and UIs with that kind of thing are rare.

    But I wouldn't say they were nonexistent for 60 years.

> With the new AI systems, the user no longer tells the computer what to do. Rather, the user tells the computer what outcome they want. Thus, the third UI paradigm, represented by current generative AI, is intent-based outcome specification.

Wow! For the first time ever, I will be able to describe to a trained professional what I want, and they will do it for me! Before today I used to write out the exact arm motions a carpenter would need to carve me a chair, but now I can just ask them for one!

This article is stupid. AI will make it easier for computers to interpret human interactions leading to increased efficiency and usability. Just like every other useful tool ever invented. There, I've put more insight into this comment than their article.

I built a proof of concept recently that tries to show a generic hybrid of command and intent[0]. The UI generates form representations of API calls the AI agent has decided on making to complete the task (in this case booking a meeting). Some API calls are restricted so only a human can make them, which they do by being presented with a form waiting for them to submit to continue.

If the user is vague, the bot will ask questions and try to discover the information it needs. It’s only a proof of concept but I think it’s a pattern I will try to build on , as it can provide a very flexible interface.

[0] https://gptmeetings.netcore.io/

"intent-based outcome specification"... so, a declarative language such as SQL?

  • I think you'll find that INSERT and UPDATE are very much commands. SQL queries are outcome-driven sure, but try to move beyond pure queries and outcome-driven computing, without some sort of machine learning, gets quite difficult. And moving outside of a single SELECT is a huge barrier.

    Even within the scope of SQL, consider an ML system that can slice-and-dice previous SQL queries interactively, based on non-expert user input.

    Consider an ML system that essentially edits an proposed SQL transaction as a whole, based on your requests. Previewing results etc, adjusting INSERTs and UPDATEs as user clarifies intent. User terminology focuses on the outcome, not on the individual commands, ordering, etc.

    Now move from that narrow domain into something like "I want to organize a conference", "I want to write a book", etc, and all the things that are beyond a single SQL SELECT.

    • I built a system that uses GPT to write KQL queries (similar to SQL) for a specific table. It could even combine multiple queries or throw in a custom chart if requested.

      OpenAI's models are good at writing SQL. I think they finally allow the type of use case that SQL itself was supposed to provide as originally envisioned.

  • While it was initially meant as user interface layer of sorts, I think, it's not really something that the typical user can be expected to know nowadays.

Here is how I think about it

The LLM's are infinity app stores. All you need is an LLM and a database plus the ability to speak english and you can replace most features provided by SaaS services today.

The GUI becomes a byproduct of the problem you want to solve rather than the gatekeeper to what you can solve.

https://twitter.com/Hello_World/status/1660463528984150018?s...

  • I share this opinion as well, I think. I’m looking at any CRUD app I’ve worked on and thinking: this is just a specification over a database. Same with most other software but the web and most mobile apps seem ripe to just become generated patterns.

The usability of a system is mostly irrelevant. The measure of a good UI is how much money it can get the user to spend, not how intuitively it enables the user to achieve a task.

  • Ah, I love when someone gets it! What design has come to is beyond sad, it’s revolting.

Weren’t voice assistants a new UI paradigm? Also, tellingly, they turned out to not be anywhere near as useful as people hoped. Sometimes new isn’t a good thing.

This is not a new UI paradigm. Virtual assistants have been doing exactly this for years. It's just gotten cheap and low-latency enough to be practical.

  • Yep, although they were doing it 'badly', I guess it not being quite so terrible is the 'new paradigm', which is eyeroll worthy IMO.

>Then Google came along, and anybody could search

Then they flooded the search results with ads and now you can search but hardly find.

I bet the same will happen with software like ChatGPT.

It would be neat if someone could make a good adventure game with an LLM, but they’re too prone to getting argued into just letting you win or whatever.

> Clicking or tapping things on a screen is an intuitive and essential aspect of user interaction that should not be overlooked.

I don't know, is it? Humanity made do without it for thousands of years.

I agree that chat UI is not the answer. It’s a great start and a very familiar UI but I feel this will default to more traditional UI that shows pre defined actions and buttons depending on the workflow.

I am surprised this article does not even mention multimodal LLMs. Because the more kinds of media the LLM can take as in input and interpret the easier the interaction with it gets.

ChatGPT and all AI is crap. If we don't want to use it, then it will fail to exist and will be nothing but a massive new failure for Microsoft.

FB’s AI head just said LLMs are a fad.

I thought about how to use them… I wish they could render an interface (HTML and JS at least, but also produce artifacts like PowerPoints).

What is really needed is for LLMs to produce some structured markup, that can then be rendered as dynamic documents. Not text.

As input, natural language is actually inferior to GUIs. I know the debate between command line people and GUI people and LLMs would seem like they’d boost the command-line people’s case, but any powerful system would actually benefit from a well designed GUI.

  • As someone who just spent 2 hours in my company's Confluence site, trying to track down the answer to a single question that could have been resolved in seconds by an LLM trained on an internal corporate corpus -- LLMs are very much not a fad.

    • LLMs are useful for particular types of things.

      LLMs as the solution for every, or most, problems is a fad.

    • How do you know the answer is right?

      Because it linked you to the source?

      Like a vector database would? Google offered to index sites since 1996.

      3 replies →

  • Here is the main reason:

    Any sufficiently advanced software has deep structure and implementation. It isn’t like a poet who can just bullshit some rhymes and make others figure out what they mean.

    The computer program expects some definite inputs which it exposes as an API eg a headless CMS via HTTP.

    Similar with an organization that can provide this or that servicd or experience.

    Therefore given this rigidity, the input has limited options at every step. And a GUI can gracefully model those limitations. A natural language model will make you think there is a lot of choice but really it will boil down to a 2018-era chatbot that gives you menus at every step and asks whether you want A, B or C.

That website has a surprisingly boring design. I haven't looked at it in years, and was expecting some impressively clean and elegant design. But it looks like a Wordpress site.

  • I read this comment before clicking, and wow, oh boy do I disagree! The information design is impressively straight-forward. I can see every feature of the site right away with no overload or distraction from the content. There's an intuitive distinction categorizing every page element and I know what everything does and how to get everywhere without having to experiment. The fonts, spacing, groupings, and colors are all nice looking, purposeful, and consistent.

    I'm not exactly sure how you're using the word "boring" in this context. There are good kinds of boring and bad kinds of boring, and I think this is the good kind.

  • I'll be honest...I like it. Boring with easily readable content is far better than most of the other junk that is put forward nowadays...

  • What isn’t “clean” about it?

    I’ve found it incredibly easy to navigate and digest its content. What more are you looking for?

  • It's clear, easy to read, and easy to navigate. I wish lots more of the web were as "boring" as this site.

  • yeah the site is bad, but not because it is boring, but because it should be even more simplified than how it is now. almost of the CSS "finishing touches" have something wrong with them. the content shifts on page load: https://hiccupfx.telnet.asia/nielsen.gif bizarre dropdown button behavior: https://hiccupfx.telnet.asia/what.gif and i can go on and on. i don't feel this nitpick whining is unwarranted considering the site purports to be a leader in user experience.

    • Reading this made me realize just how much my priorities have changed over the course of my career. In the beginning, this is exactly the kind of thing I would absolutely never let pass, and I still am very keen to fix this kind of ugliness when I have the leeway. But nowadays, I'm ecstatic just to see something useful and not confusing or frustrating. These kinds of rough edges that give the user the impression of crappy software but don't materially harm usability have come to be second-order issues that I often don't even think about until larger problems have been fixed. Arguably what I've adopted is a form of pessimism.

  • Maybe you could do a CSS redesign of it ? You could even hold a contest on Twitter or on blogs to compare redesigns/relooking people are coming up with ?

    That could be interesting.