Comment by com2kid
17 hours ago
Information in an LLM exists in two places:
1. Embedded in the parameters
2. Within the context window
We all talk a lot about #2, but until we get a really good grip on #1, I think we as a field are going to hit a progress wall.
The problem is we have not been able to separate out knowledge embedded in parameters with model capability, famously even if you don't want a model to write code, throwing a bunch of code at a model makes it a better model. (Also famously, even if someone never grows up to work with math day to day, learning math makes them better at all sorts of related logical thinking tasks.)
Also there is plenty of research showing performance degrades as we stuff more and more into context. This is why even the best models have limits on tool call performance when naively throwing 15+ JSON schemas at it. (The technique to use RAG to determine which tool call schema to feed into the context window is super cool!)
I wonder if the next phase for leveraging LLMs against large sets of contextual, proprietary data (code repositories & knowledge bases come to mind) is going to look more like smaller models highly (and regularly) trained/fine-tuned against that proprietary data (that is maybe delegated tasks by the ultra-sized internet scale omni-brain models)
If I'm asking Sonnet to agentically make this signin button green: does it really matter that it can also write haikus about the japanese landscape? That links back to your point: We don't have a grip, nearly at all, on how much this crosstalk between problem domains matters. Maybe it actually does matter? But certainly most of it doesn't. B
We're so far from the endgame on these technologies. A part of me really feels like we're wasting too much effort and money on training ASI ultra internet scale models. I'm never going to pay $200+/mo for even a much smarter Claude; what I need is a system that knows my company's code like the back of its hand, knows my company's patterns, technologies, and even business (Jira boards, Google docs, etc), and extrapolates from that. That would be worth thousands a month; but what I'm describing isn't going to be solved by a 195 IQ gigabrain, and it also doesn't feel like we're going to get there with context engineering.