Comment by dboon
4 days ago
First, thanks for doing everything you do! I, and I’m sure countless others, genuinely benefit from you.
How would you recommend someone with a strong background in undergraduate level traditional ML get into deep learning? I use that as a broad term to encompass all the knowledge needed to understand how these models work, starting from the deep learning models of a decade ago, plus the practical ability to collect data or build RL gyms and fine tune them.
I understand ML math well enough that I’m confident I could follow a modern white paper after a lot of effort and research. But there are so many pieces — quantizations, flash attention, Mode, batch sizes, layer sizes, model sparsity. I feel very overwhelmed trying to piece together how all of the pieces arose, and even more overwhelmed trying to figure out how one even goes about fine tuning one. I (like most people here) am extremely technical, and it’s not often I feel this way about a field.
Thanks again! Best of luck on your work
As someone who has students that work in deep learning, I can say that it is unwise to approach deep learning in the same way as traditional ML. Most classical methods are strongly mathematically motivated and have excellent theory to accompany them. Deep learning is still alchemy; it is a matter of experience, trying things out and getting a feel for how the pieces fit together in a modular format. Once you are experienced with the common building blocks, you can develop an intuition for how they might be improved.
I would start with training a basic MLP on tabular data. Then switch to CNNs: LeNet, VGG, then ResNet. Understand each of the new blocks that are incorporated into each architecture and how they improve stability and training efficiency. There are good PyTorch tutorials for these. Use these as a playground to understand what each of the training knobs do. Look at how their implicit biases induce double descent; this should give you confidence that overfitting is rarely an issue anymore. Give finetuning a try by taking a pretrained ResNet on ImageNet, adding layers to the start and end, and training only these to adapt the model to another image dataset. This should demonstrate the power of finetuning and why pretrained models are so powerful.
Next, briefly consider a tutorial on LSTMs, recognizing the exploding and vanishing gradient problems and the traditional challenges with sequential data.
Then move to transformers. Work with language first, starting from Andrej Karpathy's excellent YouTube tutorials. Train the model in full for a bit, then see about using an existing GPT2 checkpoint. Try adapting NanoGPT to a mathematical dataset as an exercise. Then take a look at llm.c to see how to really improve performance.
Finally, take a look at ViT and DETR. Use pretrained models and finetune them on smaller datasets again.
By this point, you should have a good grounding to start reading much of the surrounding literature and understand them. You should also understand that models are never built from scratch anymore, and every model is a collection of individual pieces built elsewhere for a particular purpose.
Thank you, that was a wonderful reply!
> I’m confident I could follow a modern white paper after a lot of effort and research.
Without having done it for deep learning, I'm sure it is like any other area of computer science. You get to exactly the level you're at now, and then you put in that effort following modern papers, and each one gets easier and easier. A year later you've done the literature review for your Phd. :)