← Back to context

Comment by HarHarVeryFunny

6 hours ago

There's a big difference between building an ML framework like Tensorflow or PyTorch (I built a Lua Torch-like one in C++ myself) and just using it to build/train a model.

Building the model may range from very simple if you are just recreating a standard architecture, or be a research endeavor if you are designing something completely new.

The difficulty/complexity of then training the model depends on what it is. For something simple like a CNN for image recognition, it's really just a matter of selecting a few hyperparameters and letting it rip. At the other end of the spectrum you've got LLMs where training (and coping with instabilities) is something of a black art, with RL training completely different from pre-training, and there is also the issue of designing/discovering a pre/mid/post training curriculum.

But anyways, the actual training part can be very simple, not requiring too much knowledge of what's going on under the hood, depending on the model.