Comment by appplication

11 hours ago

Not a mathematician so I’m immediately out of my depth here (and butchering terminology), but it seems, intuitively, like the presence of a massive amount of local minima wouldn’t really be relevant for gradient descent. A given local minimum would need to have a “well” at least be as large as your step size to reasonably capture your descent.

E.g. you could land perfectly on a local minima but you won’t stay the unless your step size was minute or the minima was quite substantial.

2 comments

appplication

sdenton4 1 hour ago

The randomness (and exploration) encouraged by batch training also helps avoid 'real' minima, if they exist.

fc417fc802 10 hours ago

I believe what was meant was that assuming local minima of a sufficient size to capture your probe, given a sufficiently high density of those, you become extremely likely to get stuck. A counterpoint regarding dimensionality is made by the comment adjacent to yours.