Comment by icoder
3 days ago
I can understand how/that this works, but it still feels like a 'hack' to me. It still feels like the LLM's themselves are plateauing but the applications get better by running the LLM's deeper, longer, wider (and by adding 'non ai' tooling/logic at the edges).
But maybe that's simply the solution, like the solution to original neural nets was (perhaps too simply put) to wait for exponentially better/faster hardware.
This is exactly how human society scaled from the cavemen era to today. We didn't need to make our brains bigger in order to get to the modern industrial age - increasingly sophisticated tool use and organization was all we did.
It only mattered that human brains are just big enough to enable tool use and organization. It ceased to matter once our brains are past a certain threshold. I believed LLMs are past this threshold as well (it has not 100% matched human brain or ever will, but this doesn't matter.)
An individual LLM call might lack domain knowledge, context and might hallucinate. The solution is not to scale the individual LLM and hope the problems are solved, but to direct your query to a team of LLMs each playing a different role: planner, designer, coder, reviewer, customer rep, ... each working with their unique perspective & context.
I get that feeling too - the underlying tech has plateaued, but now they're brute force trading extra time and compute for better results. I don't know if that scale anything but, at best, linearly. Are we going to end up with 10,000 AI monkeys on 10,000 AI typewriters and a team of a dozen monkeys deciding which one's work they like the most?
> the underlying tech has plateaued, but now they're brute force trading extra time and compute for better results
You could say the exact same thing about the original GPT. Brute forcing has gotten us pretty far.
How much farther can it take us? Apparently they've started scaling out rather than up. When does the compute become too cost prohibitive?
2 replies →
Yes. It works pretty well.
grug think man-think also plateau, but get better with tool and more tribework
Pointy sticks and ASML's EUV machines were designed by roughly the same lumps of compute-fat :)
This is an interesting point. If this ends up working well after being optimized for scale it could become the dominant architecture. If not it could become another dead leaf node in the evolutionary tree of AI.
Isn't that kinda why we have collaboration and get in room with colleagues to discuss ideas? i.e., thinking about different ideas, getting different perspectives, considering trade-offs in various approaches, etc. results in a better solution than just letting one person go off and try to solve it with their thoughts alone.
Not sure if that's a good parallel, but seems plausible.
Maybe this is the dawn of the multicore era for LLMs.
It's basically a mixture of experts but instead of a learned operator picking the predicted best model, you use a 'max' operator across all experts.
You could argue that many aspects of human cognition are "hacks" too.
…like what? I thought the consensus was that humans exhibit truly general intelligence. If LLMs require access to very specific tools to solve certain classes of problems, then it’s not clear that they can evolve into a form of general intelligence.
What would you call the very specialized portions of our brains?
The brain is not a monolith.
4 replies →
They are, but I think the keyword is "generalization". Humans do very well when innovation is required, because innovation needs generalized models that can be used to make very specialized predictions and then meta-models that can predict how specialized models relate to each other and cross reference those predictions. We don't learn arithmetic by getting fed terabytes of text like "1+1=2". We only use text to communicate information, but learn the actual logic and concept behind arithmetic, and then we use that generalized model for arithmetic in our reasoning.
I struggle to imagine how much further a purely text based system can be pushed - a system that basically knows that 1+1=2 not because it has built an internal model of arithmetic, but because it estimates that the sequence of `1+1=` is mostly followed by `2`.
They have somewhat an internal model of arithmetic, with lookup tables and separate treatment of digits. I'm conscious you might have seen this already and not interpret it like that, but in case you haven't section 6 on addition in this Anthropic interpretability paper goes into it.
https://transformer-circuits.pub/2025/attribution-graphs/bio...
Keep in mind that is a basic level of understanding of what is going on in quite a small model (Claude 3.5 Haiku). We don't know what is happening inside larger models.