← Back to context

Comment by mickg10

1 year ago

I.e. ReLU is _piecewise_ linear. That discontinuity that separates the 2 pieces is precisely what makes it non linear. Which is what enables the actual universal approximation.

Which is what I said two replies ago.

Followed by "in some sense it's [ReLU] still even MORE linear than tanh or sigmoid functions are". There's no way you misunderstood that sentence, or took it as my "definition" of linearity...so I guess you just wanted to reaffirm I was correct, again, so thanks.