← Back to context

Comment by gallerdude

5 days ago

It is interesting that most of our modes of interaction with AI is still just textboxes. The only big UX change in that the last three years has been the introduction of the Claude Code / OpenAI Codex tools. They feel amazing to use, like you're working with another independent mind.

I am curious what the user interfaces of AI in the future will be, I think whoever can crack that will create immense value.

Text is very information-dense. I'd much rather skim a transcript in a few seconds than watch a video.

There's a reason keyboards haven't changed much since the 1860s when typewriters were invented. We keep coming up with other fun UI like touchscreens and VR, but pretty much all real work happens on boring old keyboards.

  • I’ve been using ChatGPT Atlas since release on my personal laptop. I very often have it generate a comprehensive summary for YouTube videos, so I don’t have to sit there and watch/scrub a half hour video when a couple of pages of text contains the same content.

  • Here's an old blog post that explores that topic at least with one specific example: https://www.loper-os.org/?p=861

    The gist is that keyboards are optimized for ease of use but that there could be other designs which would be harder to learn but might be more efficient.

    • >> There's a reason keyboards haven't changed much since the 1860s when typewriters were invented.

      > The gist is that keyboards are optimized for ease of use but that there could be other designs which would be harder to learn but might be more efficient.

      Here's a relevant trivia question; assuming a person has two hands with five digits each, what is the largest number they can count to using only same?

      Answer: (2 ** 10) - 1 = 1023

      Ignoring keyboard layout options (such as QWERTY vs DVORAK), IMHO keyboards have the potential for capturing thought faster and with a higher degree of accuracy than other forms of input. For example, it is common for touch-typists to be able to produce 60 - 70 words per minute, for any definition of word.

      Modern keyboard input efficiency can be correlated to the ability to choose between dozens of glyphs with one or two finger combinations, typically requiring less than 2cm of movement to produce each.

      1 reply →

  • And anyone that has ever tried to talk to Siri or Alexa would prefer a keyboard for anything but the most simple questions. I don't think that will change for a long time if ever. The lack of errors and being able to say exactly what you want is so valuable.

  • No matter how good a keyboard we might be able to invent it'll always be slower than a direct brain interface, and we have those, in a highly experimental way, now.

    One day we will look back at improvements to keyboards and touchscreens as the 'faster horse' of the physical interface era.

    • I'm not convinced, because all a keyboard really costs you is latency, while almost every human-machine interaction is actually bandwidth limited (by human output).

      Even getting zero latency from a perfect brain-machine interface would not make you meaningfully faster at most things I'd assume.

      1 reply →

Unix CLI utilities have been all text for 50 years. Arguably that is why they are still relevant. Attempts to impose structured data on the paradigm like those in PowerShell have their adherents and can be powerful, but fail when the data doesn't fit the structure.

We see similar tendency toward the most general interfaces in "operator mode" and similar the-AI-uses-the-mouse-and-keyboard schemes. It's entirely possible for every application to provide a dedicated interface for AI use, but it turns out to be more powerful to teach the AI to understand the interfaces humans already use.

  • PowerShell is completely suitable. People are just used to bash and don’t feel the incentive to switch, especially with Windows becoming less relevant outside of desktop development.

    • Powershell feels like it's not built to be used in a practical way, unlike Unix tools that have been built and used by and for developers, which then feels nice because they are actually used a lot, and feel good to use.

      Like, to set an env variable permanently, you either have to go through 5 GUI interfaces, or use this PS command:

      [Environment]::SetEnvironmentVariable ("INCLUDE", $env:INCLUDE, [System.EnvironmentVariableTarget]::User)

      Which is honeslty horrendous. Why the brackets ? Why the double columns ? Why the uppercases everywhere ? I get that it's trying to look more "OOP-ish" and look like C#, but nobody wants to work with that kind of shell script tbh. It's just one example, but all the powershell commands look like this, unless they have been aliased to trick you to think windows go more unixish

      3 replies →

    • It took a long time for Powershell to write files with the same encoding it reads them by default. Very confusing until then.

  • Yet the most popular platforms on the planet have people pointing a finger (or several) at a picture.

    And the most popular media format on the planet is and will be (for the foreseeable future), video. Video is only limited by our capacity to produce enough of it at a decent quality, otherwise humanity is definitely not looking back fondly at BBSes and internet forums (and I say this as someone who loves forums).

    GenAI will definitely need better UIs for the kind of universal adoption (think smartphone - 8/9 billion people).

    • > Video is only limited by our capacity to produce enough of it at a decent quality, otherwise humanity is definitely not looking back fondly at BBSes and internet forums

      Video is limited by playback speed. It is a time-dependent format. Efforts can be made to enable video to be viewable at a range of speeds, but they are always somewhat constrained. Controlling video playback to slow down and rewatch certain parts is just not as nice as dealing with the same thing in text (or static images), where it’s much easier to linger and closely inspect parts that you care more about or are struggling to understand. Likewise, it’s easier to skim text than video.

      This is why many people prefer transcripts, or articles, or books over videos.

      I seriously doubt that people would want to switch text-based forums to video if only video were easier to make. People enjoy writing for the way it inspires a different kind of communication and thought. People like text so much that they write in journals that nobody will ever see, just because it helps them organize their thoughts.

      1 reply →

When we have really fast and good models it will be able to generate a GUI on the fly. It could probably be done now with a fine-tune on some kind of XML-based UI schema or something. I gave it a try but couldn't figure it out entirely, consistency would be an issue too.

I agree i think specifically the world is multi modal. Getting a chat to be truly multi modal .i.e interacting with different data types and text in an unified way is going to be the next big thing. Mainly given how robotics is taking off 3d might be another important aspect to it. At vlm.run we are trying to make this possible how to combine VLM's and LLM's in a seem less way to get the best UI. https://chat.vlm.run/c/3fcd6b33-266f-4796-9d10-cfc152e945b7

Personally I find the information density of text to be the "killer feature". I've tried voice interaction (even built some AI Voice Agents) and while they are very powerful, easy to use and just plain cool, they are also slow. Nothing beats skimming over a generated text response and just picking out chunks of text, going back and forth, rereading, etc. Text is also universal, I can't copy-paste a voice response to another application/interface or iterate over it.

My personal view is that the search for a better AI User Interface is just the further dumbing down of the humans who use these interface. Another comment mentioned that the most popular platforms are people pointing fingers at pictures and without a similar UI/UX AI would never reach such adoption rates, but is that what we want? Monkeys pointing at colorful picture blobs?

People get a little too hung up on finding the AI UI. It does not seem all necessary that the interfaces will be much different (while the underlying tech certainly will be).

Text and boxes and tables and graphs is what we can cope with. And while the AI is going to change much, we are not.

I get what you’re saying here, and you’re right that other UIs will be a big deal in the near future… but I don’t think it’s fair to say “just” textboxes.

This is HN. A lot of us work remotely. Speaking for myself, I much prefer to communicate via Slack (“just a textbox”) over jumping into a video call. This is especially true with technical topics, as text is both more dense and far more clear than speech in almost all cases.

Grok has been integrated into Tesla vehicles, and I've had several voice interactions with it recently. Initially, I thought it was just a gimmick, but the voice interactions are great and quite responsive. I've found myself using it multiple times to get updates on the news or quick questions about topics I'm interested in.

If you are interested in UX a youtube series I found enjoyable and thought provoking is "liber indigo" (sorry, on mobile)

What comes after the desktop metaphor and mobile? There is VR but... no one is sure it will get anywhere. It's cool but probably won't supplant tradition.

Maybe the ability of AI to accept somewhat imprecise inputs will help us get away from text. Multimodal gesture, voice, and touch perhaps?. So we would all be sort of body acting like players on a stage, in order to convey to a machine what direction you wish to turn its attention

ChatGPT's voice is absolutely amazing and I prefer it to text for brainstorming.

  • Ooooh, it bothers me, so, so, so much. Too perky. Weirdly casual. Also, it's based on the old 4o code - sycophancy and higher hallucinations - watch out. That said, I too love the omni models, especially when they're not nerfed. (Try asking for a Boston, New York, Parisian, Haitian, Indian and Japanese accent from 4o to explore one of the many nerfs they've done since launch)

    • I think the commenter you're replying to was talking about dictating to ChatGPT, which I also find extremely useful.