Comment by dvt

2 months ago

What we need, imo, is:

1. A new UX/UI paradigm. Writing prompts is dumb, re-writing prompts is even dumber. Chat interfaces suck.

2. "Magic" in the same way that Google felt like magic 25 years ago: a widget/app/thing that knows what you want to do before even you know what you want to do.

3. Learned behavior. It's ironic how even something like ChatGPT (it has hundreds of chats with me) barely knows anything about me & I constantly need to remind it of things.

4. Smart tool invocation. It's obvious that LLMs suck at logic/data/number crunching, but we have plenty of tools (like calculators or wikis) that don't. The fact that tool invocation is still in its infancy is a mistake. It should be at the forefront of every AI product.

5. Finally, we need PRODUCTS, not FEATURES; and this is exactly Pete's point. We need things that re-invent what it means to use AI in your product, not weirdly tacked-on features. Who's going to be the first team that builds an AI-powered operating system from scratch?

I'm working on this (and I'm sure many other people are as well). Last year, I worked on an MVP called Descartes[1][2] which was a spotlight-like OS widget. I'm re-working it this year after I had some friends and family test it out (and iterating on the idea of ditching the chat interface).

[1] https://vimeo.com/931907811

[2] https://dvt.name/wp-content/uploads/2024/04/image-11.png

22 comments

dvt

hermitShell 2 months ago

Agreed, our whole computing paradigm needs to shift at a fundamental level in order to let AI be 'magic', not just token prediction. Chatbots will provide some linear improvements, but ultimately I very much agree with you and the article that we're trapped in an old mode of thinking.

You might be interested in this series: https://www.youtube.com/@liber-indigo

In the same way that Microsoft and the 'IBM clones' brought us the current computing paradigm built on the desktop metaphor, I believe there will have to be a new OS built on a new metaphor. It's just a question of when those perfect conditions arise for lightning to strike on the founders who can make it happen. And just like Xerox and IBM, the actual core ideas might come from the tech giants (FAANG et al.) but they may not end up being the ones to successfully transition to the new modality.

nthingtohide 2 months ago

Feature Request: Can we have dark mode for videos? An AI OS should be able to understand and satisfy such a usecases.

E.g. Scott Aaronson | How Much Math Is Knowable?

https://youtu.be/VplMHWSZf5c

The video slides could be converted into a dark mode for night viewing.

jonahx 2 months ago

> 3. Learned behavior. It's ironic how even something like ChatGPT (it has hundreds of chats with me) barely knows anything about me & I constantly need to remind it of things.

I've wondered about this. Perhaps the concern is saved data will eventually overwhelm the context window? And so you must judicious in the "background knowledge" about yourself that gets remembered, and this problem is harder than it seems?

Btw, you can ask ChatGPT to "remember this". Ime the feature feels like it doesn't always work, but don't quote me on that.

dvt 2 months ago
Yes, but this should be trivially done with an internal `MEMORY` tool the LLM calls. I know that the context can't grow infinitely, but this shouldn't prevent filling the context with relevant info when discussing topic A (even a lazy RAG approach should work).
- otabdeveloper4 2 months ago
  
  What you're describing is just RAG, and it doesn't work that well. (You need a search engine for RAG, and the ideal search engine is an LLM with infinite context. But the only way to scale LLM context is by using RAG. We have infinite recursion here.)
- nthingtohide 2 months ago
  
  You are asking for a feature like this. Future advances will help in this.
  https://youtu.be/ZUZT4x-detM

sanderjd 2 months ago

On the tool-invocation point: Something that seems true to me is that LLMs are actually too smart to be good tool-invokers. It may be possible to convince them to invoke a purpose-specific tool rather than trying to do it themselves, but it feels harder than it should be, and weird to be limiting capability.

My thought is: Could the tool-routing layer be a much simpler "old school" NLP model? Then it would never try to do math and end up doing it poorly, because it just doesn't know how to do that. But you could give it a calculator tool and teach it how to pass queries along to that tool. And you could also give it a "send this to a people LLM tool" for anything that doesn't have another more targeted tool registered.

Is anyone doing it this way?

dvt 2 months ago
> Is anyone doing it this way?
I'm working on a way of invoking tools mid-tokenizer-stream, which is kind of cool. So for example, the LLM says something like (simplified example) "(lots of thinking)... 1+2=" and then there's a parser (maybe regex, maybe LR, maybe LL(1), etc.) that sees that this is a "math-y thing" and automagically goes to the CALC tool which calculates "3", sticks it in the stream, so the current head is "(lots of thinking)... 1+2=3 " and then the LLM can continue with its thought process.
- namaria 2 months ago
  
  Cold winds are blowing when people look at LLMs and think "maybe an expert system on top of that?".
  
  10 replies →
- sanderjd 2 months ago
  
  Definitely an interesting thought to do this at the tokenizer level!

erklik 2 months ago

> 1. A new UX/UI paradigm. Writing prompts is dumb, re-writing prompts is even dumber. Chat interfaces suck.

> 2. "Magic" in the same way that Google felt like magic 25 years ago: a widget/app/thing that knows what you want to do before even you know what you want to do.

and not to "dunk" on you or anything of the sort but that's literally what Descartes seems to be? Another wrapper where I am writing prompts telling the AI what to do.

dvt 2 months ago

> and not to "dunk" on you or anything of the sort but that's literally what Descartes seems to be? Another wrapper where I am writing prompts telling the AI what to do.
Not at all, you're totally correct; I'm re-imagining it this year from scratch, it was just a little experiment I was working on (trying to combine OS + AI). Though, to be clear, it's built in rust & it fully runs models locally, so it's not really a ChatGPT wrapper in the "I'm just calling an API" sense.