← Back to context

Comment by SJC_Hacker

2 days ago

The math isn't that difficult. The transformers paper (https://proceedings.neurips.cc/paper_files/paper/2017/file/3...) was remarkably readable for such a high impact paper. Beyond the AI/ML specific terminology (attention) that were thrown out

Neural networks are basically just linear algebra (i.e matrix multiplication) plus an activation function (ReLu, sigmoid, etc.) to generate non-linearities.

Thats first year undergrad in most engineering programs - a fair amount even took it in high school.

I'd like to re-enforce this viewpoint. The math is non-trivial, but if you're a software engineer, you have the skills required to learn _enough_ of it to be useful in the domain. It's a subject which demands an enormous amount of rote learning - exactly the same as software engineering.

hot take: i don't think you even need to understand much linear algebra/calculus to understand what a transformer does. like the math for that could probably be learned within a week of focused effort.

  • Yeah to be honest its mostly the matrix multiplication, which I got in second year algebra (high school)0.

    You don't really need even need to know about determinants, inverting matrices, Gauss-Jordan elimination, eigenvalues, etc. that you'd get in a first year undergrad linear algebra