← Back to context

Comment by quantadev

1 year ago

If you're confused just show a tanh graph and a ReLU graph to a 7 year old child and ask which one is linear. They'll all get it right. So you're not confused in the slightest bit about anything I've said. There's nothing even slightly confusing about saying a ReLU is made of two lines.

Well, 7-year-olds don’t know a lot of math, typically, so I wouldn’t ask one that question. “Linear” has a very precise mathematical definition, which is not “made of some straight lines”, that when used properly enables entire fields of endeavor.

It would be less confusing if you chose a different word, or at least defined the ones you’re using. In fact, if you tried to precisely express what you mean by saying something is “more linear”, that might be a really interesting exploration.

  • It's perfectly legitimate to discuss the linear aspects of piecewise linear functions. I've heard Andrej Karpathy do it in precisely same way I did on this thread, talking about ReLU.

    We just have a lot of very pedantic types on HN who intentionally misinterpret other people's posts in order to have something to "disprove".

I.e. ReLU is _piecewise_ linear. That discontinuity that separates the 2 pieces is precisely what makes it non linear. Which is what enables the actual universal approximation.

  • Which is what I said two replies ago.

    Followed by "in some sense it's [ReLU] still even MORE linear than tanh or sigmoid functions are". There's no way you misunderstood that sentence, or took it as my "definition" of linearity...so I guess you just wanted to reaffirm I was correct, again, so thanks.