Comment by groundzeros2015

1 day ago

I just assume that people who are going to do useful things in ML have basic foundation in math and science. If you don’t know what a derivative is what are we doing talking about multi-variable optimization.

And it’s not about gate-keeping it’s really about being able to reason about these concepts. What this looks like in programming is people memorizing a million clean code rules and not being able to write binary search.

When you learn calculus, you learn three things: the intuitions behind the concepts, the formal definitions of those concepts, and the techniques to efficiently solve problems using these concepts without a computer; things like integration by parts or by substitution.

If what you want to understand is neural networks, even at a deep level, you need a very good intuitive grasp of what derivatives are (without necessarily understanding what a limit is, if you really want to show a definition, teach the infinitesimal). You also need to understand the rules of derivation, which you can relatively easily explain if you explain derivatives. You don't need other calculus concepts (like limits, sequences or integrals). You don't need the formal definitions. You don't need to solve large derivatives on paper, and you certainly don't need to be fast at it and be able to do it in a closed-book exam setting.

There's a wide gulf between knowing what a derivative is and proficiently working out the derivatives of arbitrary functions. The extent of understanding required for most applied ML is "rate of change".

  • Is it that wide though? For example, how do you explain why you cannot autograd through sampling (and thus you use either a reparameterization trick, or gumbel). Sure, instead of relying on differentiability, you can intuitively explain it "the output changes only when you literally reach the next threshold, so all the way in between you don't really get a good direction", but how far are you going to take this?

    I agree with your general point, that we don't need insane levels of math, but I would say a college level of calculus, linalg and probability is baseline.

    A basic benchmark off the top of my head:

    Being able to pick up, without stumbling on the fundamentals

    - what LoRA is doing

    - how a RBF-kernel SVM works

    - why KL and reverse-KL are different

    - why using mean squared error is equivalent to MLE on a gaussian

    Not saying the four above pieces are all necessary, but that you should be able to learn them on demand without needing to revisit what a basis vector is.

    "Working out derivatives of arbitrary functions" is school level.

    • Rate of change -> it is flat -> that is not a useful signal. I don't see the issue?

      We aren't talking about doing cutting edge research, just educating people on the basics of how ML does what it does. I agree that the things you list should follow at some point in the sequence for any rigorous education. But it's a question of at what point those things should come up and what the corresponding depth of education is.

      For the initial introduction I think everything you listed is entirely out of scope. You don't need any of that to get a basic MLP working using a for loop and naive gradient descent.

      2 replies →