← Back to context

Comment by marketingan

1 day ago

Deep learning is just glorified linear algebra. Master the progression: Feed-forward CNN RNN LSTM Attention. You don't even need a GPU to understand the climax; Karpathy’s llama2.c implements a full transformer inference engine in just ~300 lines of C using SIMD pragmas for CPU execution.

I wish more people pursued that approach to teaching neural networks.

First teach what the network does and why, writing it as a loopy, inference-only Python function. Explain training only in an abstract way, E.G. with the "take a random weight, twist it a little and see if the loss improves" algorithm. This lets you focus on the architecture and on why it is what it is.

Then, teach the intuitions behind derivatives and gradient descent. You don't need the entirety of calculus, there's no benefit to knowing how a sequence or limit works if you ) only want to understand neural networks. With autograd, you won't be manually doing derivatives of weird functions either, so intuitive understanding is a lot more important than doing dozens of traditional calculus exercises on paper like it's the 1800s. You could probably explain the little bit of calculus you need in an hour or two, even to somebody with a 12-year-old's understanding of math and a good bit of programming knowledge.

Only when people understand the training and inference, implemented with loops and descriptive variable names, teach the tensor, explain how a modern CPU and GPU works (because many programmers still think a modern computer is just a much faster 6502), and then teach the tricks we use to make it fast.

  • I just assume that people who are going to do useful things in ML have basic foundation in math and science. If you don’t know what a derivative is what are we doing talking about multi-variable optimization.

    And it’s not about gate-keeping it’s really about being able to reason about these concepts. What this looks like in programming is people memorizing a million clean code rules and not being able to write binary search.

    • When you learn calculus, you learn three things: the intuitions behind the concepts, the formal definitions of those concepts, and the techniques to efficiently solve problems using these concepts without a computer; things like integration by parts or by substitution.

      If what you want to understand is neural networks, even at a deep level, you need a very good intuitive grasp of what derivatives are (without necessarily understanding what a limit is, if you really want to show a definition, teach the infinitesimal). You also need to understand the rules of derivation, which you can relatively easily explain if you explain derivatives. You don't need other calculus concepts (like limits, sequences or integrals). You don't need the formal definitions. You don't need to solve large derivatives on paper, and you certainly don't need to be fast at it and be able to do it in a closed-book exam setting.

    • There's a wide gulf between knowing what a derivative is and proficiently working out the derivatives of arbitrary functions. The extent of understanding required for most applied ML is "rate of change".

      4 replies →

So you created a new account to blatantly plagiarize another comment from this same page? What's even going on here?