← Back to context

Comment by dauertewigkeit

3 months ago

CNNs and Transformers are both really simple and intuitive so I don't think there is any stroke of genius in how they were devised.

Their success is due to datasets and the tooling that allowed models to be trained on large amounts of data, sufficiently fast using GPU clusters.

Exactly right. Neatly said by the author in the linked article.

> I spent years building ImageNet, the first large-scale visual learning and benchmarking dataset and one of three key elements enabling the birth of modern AI, along with neural network algorithms and modern compute like graphics processing units (GPUs).

Datasets + NNs + GPUs. Three "vastly different" advances that came together. ImageNet was THE dataset.

"CNNs and Transformers are both really simple and intuitive" and labeling a bunch of images you downloaded is not simple and intuitive? It was a team effort and I would hardly call a single dataset what drove modern ML. Most of currently deployed modern ML wasn't trained on that dataset and didn't come from models trained on it.