← Back to context

Comment by halflings

9 years ago

I assume that multiplying by a given factor shouldn't matter since you still have the learning rate as a factor (which is itself a factor of the gradient). This might just mean that the learning rate should be lower or higher with this method.

The question is then really about which method makes it easier to tune parameters or which helps intuition the most.