Comment by hodgehog11
3 days ago
As someone who has students that work in deep learning, I can say that it is unwise to approach deep learning in the same way as traditional ML. Most classical methods are strongly mathematically motivated and have excellent theory to accompany them. Deep learning is still alchemy; it is a matter of experience, trying things out and getting a feel for how the pieces fit together in a modular format. Once you are experienced with the common building blocks, you can develop an intuition for how they might be improved.
I would start with training a basic MLP on tabular data. Then switch to CNNs: LeNet, VGG, then ResNet. Understand each of the new blocks that are incorporated into each architecture and how they improve stability and training efficiency. There are good PyTorch tutorials for these. Use these as a playground to understand what each of the training knobs do. Look at how their implicit biases induce double descent; this should give you confidence that overfitting is rarely an issue anymore. Give finetuning a try by taking a pretrained ResNet on ImageNet, adding layers to the start and end, and training only these to adapt the model to another image dataset. This should demonstrate the power of finetuning and why pretrained models are so powerful.
Next, briefly consider a tutorial on LSTMs, recognizing the exploding and vanishing gradient problems and the traditional challenges with sequential data.
Then move to transformers. Work with language first, starting from Andrej Karpathy's excellent YouTube tutorials. Train the model in full for a bit, then see about using an existing GPT2 checkpoint. Try adapting NanoGPT to a mathematical dataset as an exercise. Then take a look at llm.c to see how to really improve performance.
Finally, take a look at ViT and DETR. Use pretrained models and finetune them on smaller datasets again.
By this point, you should have a good grounding to start reading much of the surrounding literature and understand them. You should also understand that models are never built from scratch anymore, and every model is a collection of individual pieces built elsewhere for a particular purpose.
Thank you, that was a wonderful reply!