Comment by d_burfoot
6 days ago
It's a radical change in human/computer interface. Now, for many applications, it is much better to present the user with a simple chat window and allow them to type natural language into it, rather than ask them to learn a complex UI. I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".
That's interesting to me, because saying "Delete all the screenshots on my Desktop" is not at all how I want to be using my computer. When I'm getting breakfast, I don't instruct the banana to "peel yourself and leap into my mouth," then flop open my jaw like a guppy. I just grab it and eat it. I don't want to tell my computer to delete all the screenshots (except for this or that that particular one). I want to pull one aside, sweep my mouse over the others, and tap "delete" to vanish them.
There's a "speaking and interpreting instructions" vibe to your answer which is at odds with my desire for an interface that feels like an extension of my body. For the most part, I don't want English to be an intermediary between my intent and the computer. I want to do, not tell.
> I want to do, not tell.
This 1000%.
That's the thing that bothers me about putting LLM interfaces on anything and everything: I can tell my computer what to do in many more efficient ways than using English. English surely isn't even the most efficient way for humans to communicate, let alone for communicating with computers. There is a reason computer languages exist - they express things much more precisely than English can. Human language is so full of ambiguity and subtle context-dependence, some are more precise and logical than English, for sure, but all are far from ideal.
I could either:
A. Learn to do a task well, after some practice, it becomes almost automatic. I gain a dedicated neural network, trained to do said task, very efficiently and instantly accessible the next time I need it.
Or:
B. Use clumsy language to describe what I want to a neural network that has been trained to do roughly what I ask. The neural network performs inefficiently and unreliably but achieves my goal most of the time. At best this seems like a really mediocre way to do a lot of things.
I basically agree, but with the caveat that the tradeoff is the opposite for a bunch of tedious things that I don't want to invest time into getting better at, or which maybe I only do rarely.
This. Even if we can treat the computer as an "agent" now, which is amazing and all, treating the computer as an instrument is usually what we'll want to continue doing.
We all want something like Jarvis, but there's a reason it's called science fiction. Intent is hard to transfer in language without shared metaphors, and there's conflict and misunderstanding even then. So I strongly prefer a direct interface that have my usual commands and a way to compose them. Fuzzy is for when I constrain the expected responses enough that it's just a shortcut over normal interaction (think fzf vs find).
Do we? For commanding use cases articulating the action into English can feel more difficult than just doing it. Direct manipulation feels more primal to me.
Genuine question, which part of Jarvis is still science fiction? Interacting with a flying suit of armor powered by a fictional pseudo-infinite power source, as are the robots, and the fighting aliens & supervillains, but as far as having a robot companion like the movie "Her", that you can talk with about your problems, ChatGPT is already there. People have customized their ChatGPT through the use of the memories feature, given it a custom name, and tuned how they want it to respond; sassy/sweet/etc, how they want it to refer to them. they'll have conversations with it about whatever. It can go and search the Internet for stuff. Other than using it to manipulate a flying suit of armor which doesn't exist, to fight aliens, efficient the jury's still out on, which parts are there that are still science fiction? I'm assuming there's a big long list of things, I'm just not at all well versed in the lore enough to have a list of things that genuinely still seem impossible and which seem like just an implementation detail that someone probably already has an MCP for.
4 replies →
It’s very interesting to me that you chose deleting files as a thing you don’t mind being less precise about.
I personally can't see this example working out. I'll always want to get some kind of confirmation of which files will be deleted, and at that point, just typing the command out is much easier than reading.
You can just ask it to undelete what you want back. Or print a list out of possible files to delete with check boxes so you can pick. Or one-by-one prompt you. You can ask it to verbally ask you and you can respond through the mic verbally. Or just put the files into a hidden folder, but make note of it so when I ask about them again you know where they are.
Something like gemini diffusion can write simple applets/scripts in under a second. So your options are enormous for how to handle those deletions. Hell if you really want you can ask it to make your a pseudo terminal that lets you type in the old linux commands to remove them if you like.
Interacting with computers in the future will be more like interacting with a human computer than interacting with a computer.
> I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".
Both are valid cases, but one cannot replace the other—just like elevators and stairs. The presence of an elevator doesn't eliminate the need for stairs.
> I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".
But why? It takes many more characters to type :)
Because with the above command your assistant will delete snapshot-01.png and snapshot-02.jpeg, and avoid deleting by mistake my-kids-birthday.png
Will it? With how they work I find it more likely to run a sudo rm -rf /* than anything else.
ChatGPT has just told me you should rather do `rm ~/Desktop/snapshot*.jpeg` in this case. I'm so impressed with this new shiny AI tech, I'd never be able to figure that out on my own!
The junior will repeatedly ask the AI to delete the screenshots. Until he forgets what is the command to delete a file.
The engineer will wonder why his desktop is filled his screenshots, change the settings that make it happen, and forget about it.
That behavior happened for years before AI, but AI will make that problem exponentially worse. Or I do hope that was a bad example.
Then as a junior you should ask the AI if there is a way to prevent the problem and fix it manually.
You might then argue that they don't know they should ask that; could just configure the AI once to say you are a junior engineer and when you ask the ai to do something, you also want it to help you learn how to avoid problems and prevent them from happening.
The command to delete a file is "chatgpt please delete this file", or could you not imagine a world where we build layers on top of unlink or whatever syscalls are relevant
This is why even if LLMs top out right now, their will still be a radical shift in how we interact with and use software going forward. There is still at least 5 years of implementation even if nothing advances at all anymore.
No one is ever going to want to touch a settings menu again.
> No one is ever going to want to touch a settings menu again.
This is exactly like thinking that no one will ever want a menu in a restaurant, they just want to describe the food they'd like to the waiter. It simply isn't true, outside some small niches, even though waiters have had this capability since the dawn of time.
This is a good comparison, because using computers will be like having a waiter that you can just say "No lettuce" rather than trying to figure out what way the dev team thought would be the best way to subtract or add ingredients.
1 reply →