← Back to context

Comment by eru

1 year ago

Well, you also need an approach to 'curve fitting' where it's actually computationally feasible to fit the curve. The approach of mixing layers of matrix multiplication with a simple non-linearity like max(0, x) (ReLU) works really well for that. Earlier on they tried more complicated non-linearities, like sigmoids, or you could try an arbitrary curve that's not split into layers at all, you would probably find it harder. (But I'm fairly sure in the end you might end up in the same place, just after lots more computation spent on fitting.)