Comment by miki123211

1 day ago

I wish more people pursued that approach to teaching neural networks.

First teach what the network does and why, writing it as a loopy, inference-only Python function. Explain training only in an abstract way, E.G. with the "take a random weight, twist it a little and see if the loss improves" algorithm. This lets you focus on the architecture and on why it is what it is.

Then, teach the intuitions behind derivatives and gradient descent. You don't need the entirety of calculus, there's no benefit to knowing how a sequence or limit works if you ) only want to understand neural networks. With autograd, you won't be manually doing derivatives of weird functions either, so intuitive understanding is a lot more important than doing dozens of traditional calculus exercises on paper like it's the 1800s. You could probably explain the little bit of calculus you need in an hour or two, even to somebody with a 12-year-old's understanding of math and a good bit of programming knowledge.

Only when people understand the training and inference, implemented with loops and descriptive variable names, teach the tensor, explain how a modern CPU and GPU works (because many programmers still think a modern computer is just a much faster 6502), and then teach the tricks we use to make it fast.

I just assume that people who are going to do useful things in ML have basic foundation in math and science. If you don’t know what a derivative is what are we doing talking about multi-variable optimization.

And it’s not about gate-keeping it’s really about being able to reason about these concepts. What this looks like in programming is people memorizing a million clean code rules and not being able to write binary search.

  • When you learn calculus, you learn three things: the intuitions behind the concepts, the formal definitions of those concepts, and the techniques to efficiently solve problems using these concepts without a computer; things like integration by parts or by substitution.

    If what you want to understand is neural networks, even at a deep level, you need a very good intuitive grasp of what derivatives are (without necessarily understanding what a limit is, if you really want to show a definition, teach the infinitesimal). You also need to understand the rules of derivation, which you can relatively easily explain if you explain derivatives. You don't need other calculus concepts (like limits, sequences or integrals). You don't need the formal definitions. You don't need to solve large derivatives on paper, and you certainly don't need to be fast at it and be able to do it in a closed-book exam setting.

  • There's a wide gulf between knowing what a derivative is and proficiently working out the derivatives of arbitrary functions. The extent of understanding required for most applied ML is "rate of change".

    • Is it that wide though? For example, how do you explain why you cannot autograd through sampling (and thus you use either a reparameterization trick, or gumbel). Sure, instead of relying on differentiability, you can intuitively explain it "the output changes only when you literally reach the next threshold, so all the way in between you don't really get a good direction", but how far are you going to take this?

      I agree with your general point, that we don't need insane levels of math, but I would say a college level of calculus, linalg and probability is baseline.

      A basic benchmark off the top of my head:

      Being able to pick up, without stumbling on the fundamentals

      - what LoRA is doing

      - how a RBF-kernel SVM works

      - why KL and reverse-KL are different

      - why using mean squared error is equivalent to MLE on a gaussian

      Not saying the four above pieces are all necessary, but that you should be able to learn them on demand without needing to revisit what a basis vector is.

      "Working out derivatives of arbitrary functions" is school level.

      3 replies →