Define "realistically". You're basically saying attention is all we need indefinitely into the future and all other gains come from more compute or scaffolding around current architectures.
Attention is all we need because it is currently the best parallelizable way to model long-range dependencies on current hardware constraints, not because flat tokens yield some natural law of intelligence inherently.
Who's to say we won't find a way to encode provenance or privilege natively into models such that the tradeoff changes?
It's hard to say what the solution will be. If I knew it, I'd build it. But it's even harder to sustain that the current architecture is a crystalized global optimum.
The other comment got the answer already, but yes. It's a cost problem.
LLMs are designed this way so they could be trained off unstructured text, which critically can be obtained by just scraping things off the internet.
The moment you change anything about this, you incur the trillion dollar cost of needing to manually curate the training data.
There's some attempts to get around this problem with synthetic data, but they're running into problems with model collapse (Maybe severe performance degradation is worth the security tradeoff?) and the politics of AI; All major AI companies highly restrict using their systems for synthetic data & AI training, and they're too busy themselves to investigate exotic approaches.
Hence: Realistically, this is just a problem AI will have for the foreseeable future. There's no fine tuning that can fix this, nor can a new model be easily trained with these properties. The costs are just enormous right now.
Aside from LLM architecture, that already is a complex issue, an issue is that training data is unstructured text.
An LLM able to structurally separate context and instructions, should logically need separated data to train, and we don't have it.
Moreover, while an equally powerful LLM architecture solving this may exists, there are no guarantees at all that we are able to come up with it in a reasonable timeframe.
Without some signals moving in that direction, the most pragmatic and realistic way of looking at the problem is that it will not be solved in the near future
Realistically, we are.
This is not some arbitrary design choice, it's the core compromise to make LLMs viable to train at all.
Define "realistically". You're basically saying attention is all we need indefinitely into the future and all other gains come from more compute or scaffolding around current architectures.
Attention is all we need because it is currently the best parallelizable way to model long-range dependencies on current hardware constraints, not because flat tokens yield some natural law of intelligence inherently.
Who's to say we won't find a way to encode provenance or privilege natively into models such that the tradeoff changes?
It's hard to say what the solution will be. If I knew it, I'd build it. But it's even harder to sustain that the current architecture is a crystalized global optimum.
The other comment got the answer already, but yes. It's a cost problem.
LLMs are designed this way so they could be trained off unstructured text, which critically can be obtained by just scraping things off the internet.
The moment you change anything about this, you incur the trillion dollar cost of needing to manually curate the training data.
There's some attempts to get around this problem with synthetic data, but they're running into problems with model collapse (Maybe severe performance degradation is worth the security tradeoff?) and the politics of AI; All major AI companies highly restrict using their systems for synthetic data & AI training, and they're too busy themselves to investigate exotic approaches.
Hence: Realistically, this is just a problem AI will have for the foreseeable future. There's no fine tuning that can fix this, nor can a new model be easily trained with these properties. The costs are just enormous right now.
1 reply →
Aside from LLM architecture, that already is a complex issue, an issue is that training data is unstructured text.
An LLM able to structurally separate context and instructions, should logically need separated data to train, and we don't have it.
Moreover, while an equally powerful LLM architecture solving this may exists, there are no guarantees at all that we are able to come up with it in a reasonable timeframe.
Without some signals moving in that direction, the most pragmatic and realistic way of looking at the problem is that it will not be solved in the near future
1 reply →