Comment by dvt
1 day ago
What we need, imo, is:
1. A new UX/UI paradigm. Writing prompts is dumb, re-writing prompts is even dumber. Chat interfaces suck.
2. "Magic" in the same way that Google felt like magic 25 years ago: a widget/app/thing that knows what you want to do before even you know what you want to do.
3. Learned behavior. It's ironic how even something like ChatGPT (it has hundreds of chats with me) barely knows anything about me & I constantly need to remind it of things.
4. Smart tool invocation. It's obvious that LLMs suck at logic/data/number crunching, but we have plenty of tools (like calculators or wikis) that don't. The fact that tool invocation is still in its infancy is a mistake. It should be at the forefront of every AI product.
5. Finally, we need PRODUCTS, not FEATURES; and this is exactly Pete's point. We need things that re-invent what it means to use AI in your product, not weirdly tacked-on features. Who's going to be the first team that builds an AI-powered operating system from scratch?
I'm working on this (and I'm sure many other people are as well). Last year, I worked on an MVP called Descartes[1][2] which was a spotlight-like OS widget. I'm re-working it this year after I had some friends and family test it out (and iterating on the idea of ditching the chat interface).
[1] https://vimeo.com/931907811
[2] https://dvt.name/wp-content/uploads/2024/04/image-11.png
> 3. Learned behavior. It's ironic how even something like ChatGPT (it has hundreds of chats with me) barely knows anything about me & I constantly need to remind it of things.
I've wondered about this. Perhaps the concern is saved data will eventually overwhelm the context window? And so you must judicious in the "background knowledge" about yourself that gets remembered, and this problem is harder than it seems?
Btw, you can ask ChatGPT to "remember this". Ime the feature feels like it doesn't always work, but don't quote me on that.
Yes, but this should be trivially done with an internal `MEMORY` tool the LLM calls. I know that the context can't grow infinitely, but this shouldn't prevent filling the context with relevant info when discussing topic A (even a lazy RAG approach should work).
You are asking for a feature like this. Future advances will help in this.
https://youtu.be/ZUZT4x-detM
What you're describing is just RAG, and it doesn't work that well. (You need a search engine for RAG, and the ideal search engine is an LLM with infinite context. But the only way to scale LLM context is by using RAG. We have infinite recursion here.)
Feature Request: Can we have dark mode for videos? An AI OS should be able to understand and satisfy such a usecases.
E.g. Scott Aaronson | How Much Math Is Knowable?
https://youtu.be/VplMHWSZf5c
The video slides could be converted into a dark mode for night viewing.
On the tool-invocation point: Something that seems true to me is that LLMs are actually too smart to be good tool-invokers. It may be possible to convince them to invoke a purpose-specific tool rather than trying to do it themselves, but it feels harder than it should be, and weird to be limiting capability.
My thought is: Could the tool-routing layer be a much simpler "old school" NLP model? Then it would never try to do math and end up doing it poorly, because it just doesn't know how to do that. But you could give it a calculator tool and teach it how to pass queries along to that tool. And you could also give it a "send this to a people LLM tool" for anything that doesn't have another more targeted tool registered.
Is anyone doing it this way?
> Is anyone doing it this way?
I'm working on a way of invoking tools mid-tokenizer-stream, which is kind of cool. So for example, the LLM says something like (simplified example) "(lots of thinking)... 1+2=" and then there's a parser (maybe regex, maybe LR, maybe LL(1), etc.) that sees that this is a "math-y thing" and automagically goes to the CALC tool which calculates "3", sticks it in the stream, so the current head is "(lots of thinking)... 1+2=3 " and then the LLM can continue with its thought process.
Cold winds are blowing when people look at LLMs and think "maybe an expert system on top of that?".
9 replies →
Definitely an interesting thought to do this at the tokenizer level!
> 1. A new UX/UI paradigm. Writing prompts is dumb, re-writing prompts is even dumber. Chat interfaces suck.
> 2. "Magic" in the same way that Google felt like magic 25 years ago: a widget/app/thing that knows what you want to do before even you know what you want to do.
and not to "dunk" on you or anything of the sort but that's literally what Descartes seems to be? Another wrapper where I am writing prompts telling the AI what to do.
> and not to "dunk" on you or anything of the sort but that's literally what Descartes seems to be? Another wrapper where I am writing prompts telling the AI what to do.
Not at all, you're totally correct; I'm re-imagining it this year from scratch, it was just a little experiment I was working on (trying to combine OS + AI). Though, to be clear, it's built in rust & it fully runs models locally, so it's not really a ChatGPT wrapper in the "I'm just calling an API" sense.