Comment by Mehvix

4 months ago

Why do you suppose this is a compute limited problem?

2 comments

Mehvix

It's kind of a shortcut answer by now. Especially for anything that touches pretraining.

"Why aren't we doing X?", where X is a thing that sounds sensible, seems like it would help, and does indeed help, and there's even a paper here proving that it helps.

The answer is: check the paper, it says there on page 12 in a throwaway line that they used 3 times the compute for the new method than for the controls. And the gain was +4%.

A lot of promising things are resource hogs, and there are too many better things to burn the GPU-hours on.

typpilol 4 months ago

Thanks.
Also, saying it needs 20x compute is exactly that. It's something we could do eventually but not now