Comment by observationist

3 hours ago

I suspect we're going to need hypernetworks of some sort - dynamically generated weights, with the hypernet weights getting the dream-like reconsolidation and mapping into the model at large, and layers or entire experts generated from the hypernets on the fly, a degree removed from the direct-from-weights inference being done now. I've been following some of the token-free latent reasoning and other discussions around CoT, other reasoning scaffolding, and so forth, and you just can't overcome the missing puzzle piece problem elegantly unless you have online memory. In the context of millions of concurrent users, that also becomes a nightmare. Having a pipeline, with a sort of intermediate memory, constructive and dynamic to allow resolution of problems requiring integration into memorized concepts and functions, but held out for curation and stability.

It's an absolutely enormous problem, and I'm excited that it seems to be one of the primary research efforts kicking off this year. It could be a very huge capabilities step change.

1 comment

observationist

bluegatty 3 hours ago

Yes, so I think that's a fine thought, I don't think it fits into LLM architecture.

Also, weirdly, even Lecun etc. are barely talking about this, they're thinking about 'world models etc'.

I think what you're talking about is maybe 'the most important thing' right now, and frankly, it's almost like an issue of 'Engineering'.

Like - its when you work very intently with the models so this 'issue' become much more prominent.

Your 'instinct' for this problem is probably an expression of 'very nuanced use' I'm going to guess!

So in a way, it's as much Engineering as it is theoretical?

Anyhow - so yes - but - probably not LLM weights. Probably.

I'll add a small thing: the way that Claude Code keeps the LLM 'on track' is by reminding it! Literally, it injects little 'TODO reminders' with some prompts, which is kind of ... simple!

I worked a bit with 'steering probes' ... and there's a related opportunity there - to 'inject' memory and control operations along those lines. Just as a starting point for a least one architectural motivation.