Comment by dachworker
2 days ago
ML Research is ripe for such a subculture to emerge, because there are truly so many research directions that are nothing more than a tower of cards ready to be exposed. You need an element of truth to capture your audience. Once you have an audience and you already deconstructed the tower of cards, you start looking for more content. And then you end up like Sabine.
Maybe at some point, but as of now it’s much more applied and empirical. Aside from money, there’s nothing stopping you from training a new architecture or loss function and sharing the weights for everyone to use.
Very recently some researchers at a Chinese lab invented a new optimizer Muon Clip which they claim is better for certain types of LLM training. I don’t think there are enough AdamW fanboys out there for it to cause a controversy. Either it works or it doesn’t.
Applied ML is truly blessed by being incredibly empirical.
So many crackpots get filtered by "oh, if your new theory is so good and powerful, then show a small scale system built on it". This hard filters 99% of crackpots, and the remaining 1% usually builds something that performs within a measurement error of existing systems.
Grand Theories Of Everything don't have such a filter. There is no easy demonstration to perform, no simple experiment to run that would show whether string theory has merit. So we get very questionable theories, and then a lot of even more questionable theories, and then crackpots and madmen as far as eye can see.
The curse on physics isn't that it has crackpots. It's that the remaining unsolved problems are incredibly hard, the space of solutions is vast, and there isn't enough experimental data coming in to quickly weed out the obviously wrong ones.
Very well said. I also think the goal of crackpots isn’t to create something useful but to have their names next to Maxwell, Einstein, and Hawking. Household names of geniuses. Their individual accomplishments are less important.
> the remaining 1% usually builds something that performs within a measurement error of existing systems
Another point that people miss when they wax poetic about how neural nets are like brains or whatnot: we didn’t pick transformers because they were the most elegant method. We use them because they work really well on the hardware we have. It’s why RNNs fell out of favor, they’re way too slow to train. Transformers made training on the whole internet possible.
Maybe RNNs can be salvaged, but my guess is they’ll be approximately as good as transformers but slower to train.