← Back to context

Comment by puttycat

2 days ago

Seems very incremental and very far from the pompous 'superintelligence' goal.

If you can collapse "retrieve this complex chunk when it is needed" into a single token, what else can you put into a token?

"Send this through the math coprocessor." "Validate against the checklist." "Call out to an agent for X." "Recheck against input stream Y." And so on.

Retrieval augmentation is only one of many uses for this. If this winds up with better integration with agents, it is very possible that the whole is more than the sum of its parts.

Think about it this way; they are encoding whole "thoughts" or "ideas" as single tokens.

It's effectively a multimodal model, which handles "concept" tokens alongside "language" tokens and "image" tokens.

A really big conceptual step, actually, IMO.

It’s unlikely that the existing LLM architecture will evolve into anything that resembles superintelligence any more than it does already.

Which means that modifications to the architecture, and combining it with other components and approaches, are the next likely step. This paper fits that.

A 30 fold improvement seems a tad more than incremental.

  • I can start brushing my teeth 30 times faster but it won't change my life. This is nice for RAG but it's a very localized improvement. And 30× sounds big but is just an order of magnitude improvement also.

    • Brushing your teeth is not central to your life, recalling facts correctly is, and a 30 fold improvement in the latter very well could change your life. I'll leave it to you to figure out which is a better analogy to RAG.

      1 reply →