← Back to context

Comment by timabdulla

1 month ago

I think one thing ignored here is the value of UX.

If a general AI model is a "drop-in remote worker", then UX matters not at all, of course. I would interact with such a system in the same way I would one of my colleagues and I would also give a high level of trust to such a system.

If the system still requires human supervision or works to augment a human worker's work (rather than replace it), then a specific tailored user interface can be very valuable, even if the product is mostly just a wrapper of an off-the-shelf model.

After all, many SaaS products could be built on top of a general CRM or ERP, yet we often find a vertical-focused UX has a lot to offer. You can see this in the AI space with a product like Julius.

The article seems to assume that most of the value brought by AI startups right now is adding domain-specific reliability, but I think there's plenty of room to build great experiences atop general models that will bring enduring value.

If and when we reach AGI (the drop-in remote worker referenced in the article), then I personally don't see how the vast majorities of companies - software and others - are relevant at all. That just seems like a different discussion, not one of business strategy.

The value of UX is being ignored, as the magical thinking has these AIs being fully autonomous, which will not work. The phrase "the devil's in the details" needs to be imprinted on everyone's screens, because the details of a "drop-in remote worker" are several Grand Canyons yet to be realized. This civilization is vastly more complex than you, dear reader, realize, and the majority of that complexity is not written down.

  • Also, the UX of your potential "remote workers" are vitally important! The difference between a good and a bad remote worker is almost always how good they are at communicating - both reading and understanding tickets of work to be done and how well they explain, annotate, and document the work they do.

    At the end of the day, someone has to be checking the work. This is true of humans and of any potential AI agent, and the UX of that is a big deal. I can get on a call and talk through the code another engineer on my team wrote and make sure I understand it and that it's doing the right thing before we accept it. I'm sure at some point I could do that with an LLM, but the worry is that the LLM has no innate loyalty or sense of its own accuracy or honesty.

    I can mostly trust that my human coworker isn't bullshitting me and any mistakes are honest mistakes that we'll learn from together for the future. That we're both in the same boat where if we write or approve malicious or flagrantly defective code, our job is on the line. An AI agent that's written bad or vulnerable code won't know it, will completely seriously assert that it did exactly what it was told, doesn't care if it gets fired, and may say completely untrue things in an attempt to justify itself.

    Any AI "remote worker" is a totally different trust and interaction model. There's no real way to treat it like you would another human engineer because it has, essentially, no incentive structure at all. It doesn't care if the code works. It doesn't care if the team meets its goals. It doesn't care if I get fired. I'm not working with a peer, I'm working with an industrial machine that maybe makes my job easier.

    • It's hilarious that people don't see this. The UX of an "llm product" is the quality of the text in text out. An "aligned model" is one with good UX. Instruct tuning is UX. RLHF is UX.

I guess part of the point is that the value of the UX will quickly start to decrease as more tasks or parts of tasks can be done without close supervision. And that is subject to the capabilities of the models which continues to improve.

I suggest that before we satisfy _everyone_'s definition of AGI, more and more people may decide we are there as their own job is automated.

The UX at that point, maybe in 5 or 10 or X years, might be a 3d avatar that pops up in your room via mixed reality glasses, talks to you, and then just fires off instructions to a small army of agents on your behalf.

Nvidia actually demoed something a little bit like that a few days ago. Except it lives on your computer screen and probably can't manage a lot of complex tasks on it's own. Yet.

Or maybe at some point it doesn't need sub agents and can just accomplish all of the tasks on its own. Based on the bitter lesson, specialized agents are probably going to have a limited lifetime as well.

But I think it's worth having the AGI discussion as part of this because it will be incremental.

Personally, I feel we must be pretty close to AGI because Claude can do a lot of my programming for me. I still have to make important suggestions, and routinely for obvious things, but it is much better at me at filling in all the details and has much broader knowledge.

And the models do keep getting more robust, so I seriously doubt that humans will be better programmers overall for much longer.

  • Which is an easier way to interact with your bank? Writing a business letter, or filling out a form?

    I suspect that we will still be filling out forms, because that’s a better UI for a routine business transaction. It’s easier to know what the bank needs from you if it’s laid out explicitly, and you can also review the information you gave them to make sure it’s correct.

    AI could still be helpful for finding the right forms, auto-filling some fields, answering any questions you might have, and checking for common errors, but that’s only a mild improvement from what a good website already does.

    And yes, it’s also helpful for the programmers writing the forms. But the bank still needs people to make sure that any new forms implement their consumer interactions correctly, that the AI assist has the right information to answer any questions, and that it’s all legal.

Chat models make UI redundant. who will want to learn how to use some apps custom interface when they are used to just asking it to do what they want/need? Chat is the most natural interface for humans. UX will be just trying to steer models to kiss your butt in the right way, eventually, and the bar for this will be low as language interaction problems are going to be obvious even to teen-agers.

  • The amount of work going into RLHF/DPO/instruct tuning and other types of post training is because UX is very important. The bar is high and the difficulty of making a model with a good UX for a given use case is high.

A drop in remote worker will still require their work to be checked and their access to the systems they need to do their work secured in case they are a bad actor.