Comment by geokon

3 years ago

I'm not an expert in this, so hopefully someone more knowledgeable can weight in - but SVMs are understood much better from the perspective over overfitting and things like the VC bound - while Transformers are not really understood as well. From what I remember it's quite easy to have a SVM overfit, while Transformers have fewer issues. It'd be interesting to understand why

So if the two are somehow connected, then that could have implications for tuning and fighting overfitting

maybe it'd also be possible to design better non-overfitting SVMs

> From what I remember it's quite easy to have a SVM overfit ... It'd be interesting to understand why

SVMs with well-tuned kernels and regularization are reasonably resistant to overfitting. The problem is that you can easily end up overfitting the hyperparameters if you're not very careful about how you do performance testing.