Comment by skoocda

1 month ago

The one thing I really disagree with is the notion that there will be millions of identical AI images.

The next big step is continual learning, which enables long-term adaptive planning and "re-training" during deployment. AI with continual learning will have a larger portion of their physical deployment devoted to the unique memories they developed via individual experiences. The line between history/input context/training corpus will be blurred and deployed agents will go down long paths of self-differentiation via choosing what to train themselves on; eventually we'll end up with a diaspora of uniquely adapted agents.

Right now inference consists of one massive set of weights and biases duplicated for every consumer and a tiny unique memory file that gets loaded in as context to "remind" the AI of the experiences it had (or did it?) with this one user / deployment. Clearly, this is cheap and useful to scale up initially but nobody wants to spend the rest of their life with an agent that is just a commodity image.

In the future, I think we'll realize that adding more encyclopedic knowledge is not a net benefit for most common agents (but we will provide access to niche knowledge behind "domain-specific" gates, like an MoE model but possibly via MCP call), and instead allocate a lot more physical capacity to storing and processing individualized knowledge. Agents will slow down on becoming more book smart, but will become more street smart. Whether or not this "street smart" knowledge ever gets relayed back to a central corpora is probably mostly dependent on the incentives for the agent.

Certainly my biggest challenge after a year of developing an industrial R&D project with AI assistance is that it needs way, way more than 400k tokens of context to understand the project properly. The emerging knowledge graph tools are a step in the right direction, certainly, but they're not nearly integrated enough. From my perspective, we're facing a fundamental limitation: as long as we're on the Transformers architecture with O(n^2) attention scaling, I will never get a sufficiently contextualized model response. Period.

You might notice this yourself if you ask Claude 4.5 (knowledge cutoff Jan 2025) to ramp up on geopolitical topics over the past year. It is just not physically possible in 400k tokens. Architectures like Mamba or HOPE or Sutton's OAK may eventually fix this, and we'll see a long-term future resembling Excession; where individual agents develop in enormously different ways, even if they came from the same base image.

0 comments

skoocda

No comments yet

Contribute on Hacker News ↗