Comment by svantana
2 years ago
The trick is to hide the answers to the test data with an authority that only reports your score, like Kaggle does. And then only allow a single submission for each new model to avoid data leakage. I find it a bit sad that this practice has fallen by the wayside, as it went pretty mainstream within the research community with the Netflix Prize back in 2009.
I wonder if techniques from differential privacy could be helpful here (in terms of the multiple-querying problem).