← Back to context

Comment by godelski

2 years ago

Definitely. Problem is that doing this helps you get published, not hurts. I think this is why there's often confusion when industry tries to use academic models, as they don't generalize well due to this overfitting. But also, evaluation is fucking hard, and there's just no way around that. Trying to make it easy (i.e. benchmarkism) just adds up creating more noise instead of the intended decrease.