Comment by the8472

3 months ago

https://gwern.net/scaling-hypothesis exponential scaling has been holding up for more than a decade now, since alexnet.

And when there were the first murmurings that maybe we're finally hitting a wall the labs published ways to harness inference-time compute to get better results which can be fed back into more training.

1 comment

the8472

dcanelhas 3 months ago

I sincerely appreciate the reply, but are you talking about Moore's law? Alexnet could run on a commercially available GPU in 2011(?). But that wasn't the peak compute platform being used at the time for DL inference, so it distorts the progress a bit. It's like me saying I was running a neural net on a raspberry pi yesterday for written character recognition on MNIST and today crunching stable diffusion on a GTX3090. Behold, a trillion-fold leap in just a day (nevermind the unrelated applications). The singularity is definitely gonna happen tomorrow!

But let's take for granted that we are putting exponential scaling to good use in terms of compute resources. It looks like we are seeing sublinear performance improvements on actual benchmarks[1]. Either way it seems optimistic at best to conclude that 1000x more compute would yield even 10x better results in most domains.

[1]fig.1 AI performance relative to human baseline. (https://hai.stanford.edu/ai-index/2025-ai-index-report)