Comment by bttf
10 days ago
It sounds like the author of this article in for a ... bitter lesson. [1]
[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
10 days ago
It sounds like the author of this article in for a ... bitter lesson. [1]
[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Might happen. Or not. Reliable LLM-based systems that interact with a world model are still iffy.
Waymo is an example of a system which has machine learning, but the machine learning does not directly drive action generation. There's a lot of sensor processing and classifier work that generates a model of the environment, which can be seen on a screen and compared with the real world. Then there's a part which, given the environment model, generates movement commands. Unclear how much of that uses machine learning.
Tesla tries to use end to end machine learning, and the results are disappointing. There's a lot of "why did it do that?". Unclear if even Tesla knows why. Waymo tried end to end machine learning, to see if they were missing something, and it was worse than what they have now.
I dunno. My comment on this for the last year or two has been this: Systems which use LLMs end to end and actually do something seem to be used only in systems where the cost of errors is absorbed by the user or customer, not the service operator. LLM errors are mostly treated as an externality dumped on someone else, like pollution.
Of course, when that problem is solved, they're be ready for management positions.
That they're also really unreliable at making reasonable API calls from input, as soon as any amount of complexity is introduced?
How so? The bitter lesson is about the effectiveness of specifically statistical models.
I doubt an expert machine’s accuracy would change if you threw more energy at it, for example.
> The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.
Is this at all ironic considering we power modern AI using custom and/not non-general compute, rather than using general, CPU-based compute?
GPUs can do general computation, they just saturate under different usage profiles.
I'd argue that GPU (and TPU) compute is even more general than CPU computation. Basically all it can do is matrix multiply types of operations!
The "bitter lesson" is extrapolating from ONE datapoint where we were extremely lucky with Dennart scaling. Sorry, the age of silicon magic is over. It might be back - at some point, but for now it's over.
the way by which things will scale is not only limited to the optimization of low level hardware but also just by brute force investment and construction of massive data centers, which is absolutely happening.
It also ignores quite a lot of neural network architecture development that happened in the mean time.
The transformer architecture IS the bitter lesson. It lets you scale your way with more data and computational resources. It was only after the fact that people come up with bespoke algorithms that increase the efficiency of transformers through human ingenuity. Turns out a lot of the things transformers do is completely unnecessary, like the V cache, for example, but that doesn't matter in practice. Everyone is training their model with V caches, because they can start training their bleeding-edge LLM today, not after they did some risky engineering into a novel architecture.
The architectures before transformers were LSTM based RNNs. They suck because they don't scale. Mamba is essentially the successor to RNNs and its key benefit is that it can be trained in parallel (better compute scaling) and yet Mamba models are still losing out to transformers because the ideal architecture for Mamba based LLMs has not yet been discovered. Meanwhile the performance hit of transformers is basically just a question of how many dollars you're willing to part with.
just in time for the end of Moore's law
again?