Comment by necroforest

3 years ago

which jargon here is "just for show"?

6 comments

necroforest

> we show that over-parameterization catalyzes global convergence by ensuring the feasibility of the SVM problem and by guaranteeing a benign optimization landscape devoid of stationary points

does this mean 'an over-parameterized transformer problem is a convex svm problem'?

tensor 3 years ago
The irony is that your "simplification" uses even more "jargon."
But yes, thats how I would read that, and I also see no issue at all with the language in the paper. These terms are used for precision, and have meaning to those in the field. Papers are written for other experts, not laymen.
- ftxbro 3 years ago
  
  OK but why they write "benign optimization landscape devoid of stationary points" instead of "convex" other than "just for show"? In my understanding it's not better for either audience experts or laymen. For experts it would be more clear to just say convex and they would know the implications, and if someone doesn't know what convex means they probably also aren't going to be on board with 'stationary points'. Also I'm not trying to pick on the authors I'm just trying to answer the question of which specific parts could be seen as 'just for show'.
  
  1 reply →
dongecko 3 years ago

I read it the same way as you did, or at least it's an approximation.
In general that's not really surprising. I remember discussions from some years ago about larger networks leading to smother loss surfaces.