Comment by sitkack
2 days ago
It is copium that it will suddenly stop and the world they knew before will return.
ChatGPT came out in Nov 2022. Attention Was All There Was in 2017, we were already 5 years in the past. Or 5 years of research to catch up to, and then from 2022 to now ... papers and research have been increasing exponentially. Even in if SOTA models were frozen, we still have years of research to apply and optimize in various ways.
I think it's equally copium that people keep assuming we're just going to compound our way into intelligence that generalizes enough to stop us from handholding the AI, as much as I'd genuinely enjoy that future.
Lately I spend all day post-training models for my product, and I want to say 99% of the research specific to LLMs doesn't reproduce and/or matter once you actually dig in.
We're getting exponentially more papers on the topics and they're getting worse on average.
Every day there's a new paper claiming an X% gain by post-training some ancient 8B parameter model and comparing it to a bunch of other ancient models after they've overfitted on the public dataset of a given benchmark and given the model a best of 5.
And benchmarks won't ever show it, but even ChatGPT 3.5-Turbo has better general world knowledge than a lot models people consider "frontier" models today because post-training makes it easy to cover up those gaps with very impressive one-prompt outputs and strong benchmark scores.
-
It feels like things are getting stuck in a local maxima: we are making forward progress, the models are useful and getting more useful, but the future people are envisioning takes reaching a completely different goal post that I'm not at all convinced we're making exponential progress towards.
There maybe exponential number of techniques claiming to be ground breaking, but what has actually unlocked new capabilities that can't just as easily be attributed to how much more focused post-training has become on coding and math?
Test time compute feels like the only one and we're already seeing the cracks form in terms of its effect on hallucinations, and there's a clear ceiling for the performance the current iteration unlocks as all these models are converging on pretty similar performance after just a few model releases.
The copium is I think many people got comfortable post financial crisis with nothing much changing or happening. I think many people really liked a decade stretch with not much more than web framework updates and smart phone versioning.
We are just back on track.
I just read Oracular Programming: A Modular Foundation for Building LLM-Enabled Software the other day.
We don't even have a new paradigm yet. I would be shocked that in 10 years I don't look back at this time of writing a prompt into a chatbot and then pasting the code into an IDE as completely comical.
The most shocking thing to me is we are right back on track to what I would have expected in 2000 for 2025. In 2019 those expectations seemed like science fiction delusions after nothing happening for so long.
Reading the Oracular paper now, https://news.ycombinator.com/edit?id=44211588
It feels a bit like Halide, where the goal and the strategy are separated so that each can be optimized independently.
Those new paradigms are being discovered by hordes of vibecoders, myself included. I am having wonderful results with TDD and AI assisted design.
IDEs are now mostly browsers for code, and I no longer copy and paste with a chatbot.
Curious what you think about the Oracular paper. One area that I have been working on for the last couple weeks is extracting ToT for the domain and then using the LLM to generate an ensemble of exploration strategies over that tree.