Comment by svantana

2 years ago

The trick is to hide the answers to the test data with an authority that only reports your score, like Kaggle does. And then only allow a single submission for each new model to avoid data leakage. I find it a bit sad that this practice has fallen by the wayside, as it went pretty mainstream within the research community with the Netflix Prize back in 2009.

I wonder if techniques from differential privacy could be helpful here (in terms of the multiple-querying problem).