Comment by HarHarVeryFunny
6 hours ago
There's a big difference between building an ML framework like Tensorflow or PyTorch (I built a Lua Torch-like one in C++ myself) and just using it to build/train a model.
Building the model may range from very simple if you are just recreating a standard architecture, or be a research endeavor if you are designing something completely new.
The difficulty/complexity of then training the model depends on what it is. For something simple like a CNN for image recognition, it's really just a matter of selecting a few hyperparameters and letting it rip. At the other end of the spectrum you've got LLMs where training (and coping with instabilities) is something of a black art, with RL training completely different from pre-training, and there is also the issue of designing/discovering a pre/mid/post training curriculum.
But anyways, the actual training part can be very simple, not requiring too much knowledge of what's going on under the hood, depending on the model.
No comments yet
Contribute on Hacker News ↗