Comment by sd9
3 days ago
Why are some models better than others today if everything is publicly known and many organisations have access to massive resources?
Somebody has to come up with an idea first. Before they share it, it is not publicly known. Ilya has previously come up with plenty of productive ideas. I don't think it's a stretch to think that he has some IP that is not publicly known.
Even seemingly simple things like how you shuffle your training set, how you augment it, the specific architecture of the model, etc, have dramatic effects on the outcome.
> Somebody has to come up with an idea first.
There are lots of ideas. Some may work.
The space in which people seem to be looking is deep learning on something other than text tokens. Yet most successes punt on feature extraction / "early vision" and just throw compute at raw pixels. That's the "bitter lesson" approach, which seems to be hitting the ceiling of how many gigawatts of data center you can afford.
Is there a useful non-linguistic abstraction of the real world that works and leads to "common sense"? Squirrels must have something; they're not verbal and have a brain the size of a peanut. But what?