← Back to context

Comment by wrs

1 year ago

Well. The word “linear” the way you use it doesn’t seem to have any particular meaning, certainly not the standard mathematical meaning, so I’m not sure we can make further progress on this explanation.

I’ll just reiterate that the single “technical” (whatever that means) nonlinearity in ReLU is exactly what lets a layer approximate any continuous[*] function.

[*] May have forgotten some more adjectives here needed for full precision.

If you're confused just show a tanh graph and a ReLU graph to a 7 year old child and ask which one is linear. They'll all get it right. So you're not confused in the slightest bit about anything I've said. There's nothing even slightly confusing about saying a ReLU is made of two lines.

  • Well, 7-year-olds don’t know a lot of math, typically, so I wouldn’t ask one that question. “Linear” has a very precise mathematical definition, which is not “made of some straight lines”, that when used properly enables entire fields of endeavor.

    It would be less confusing if you chose a different word, or at least defined the ones you’re using. In fact, if you tried to precisely express what you mean by saying something is “more linear”, that might be a really interesting exploration.

    • It's perfectly legitimate to discuss the linear aspects of piecewise linear functions. I've heard Andrej Karpathy do it in precisely same way I did on this thread, talking about ReLU.

      We just have a lot of very pedantic types on HN who intentionally misinterpret other people's posts in order to have something to "disprove".

  • I.e. ReLU is _piecewise_ linear. That discontinuity that separates the 2 pieces is precisely what makes it non linear. Which is what enables the actual universal approximation.

    • Which is what I said two replies ago.

      Followed by "in some sense it's [ReLU] still even MORE linear than tanh or sigmoid functions are". There's no way you misunderstood that sentence, or took it as my "definition" of linearity...so I guess you just wanted to reaffirm I was correct, again, so thanks.