← Back to context

Comment by mlazos

9 hours ago

You don’t need to be deep in designing NNs and the theory behind them, but I would say you should be able to take some linear algebra equations and be able to map them to the GPU arch. This does require some knowledge of the math being used. Luckily it’s mostly high-school/college level math. Starting with the CUDA and tritonlang docs are a good starting point for an introduction. They’ll teach you about common optimizations like tiling, thread swizzling and maximizing cache utilization.