Comment by dguest
9 hours ago
The LHC has moved on a bit since then. Here's an open dataset that one collaboration used to train a transformer:
https://opendata-qa.cern.ch/record/93940
if you can beat it with linear regression we'd be happy to know.
9 hours ago
The LHC has moved on a bit since then. Here's an open dataset that one collaboration used to train a transformer:
https://opendata-qa.cern.ch/record/93940
if you can beat it with linear regression we'd be happy to know.
Thanks.
The paper [1] referenced in your link follows the lagacy of the paper on the HIGGS dataset, and does not operate with quantities like accuracy and/or perplexity. HIGGS dataset paper provided area under ROC, from which one had to approximate accuracy. I used accuracy from the ADMM paper [2] to compare my results with. As I checked later, area under ROC in [1] mostly agrees with [2] SGD training results on HIGGS.
I think that perplexity measure is appropriate there in [1] because we need to discern between three outcomes. This calls for softmax and for perplexity as a standard measure.
So, my questions are: 1) what perplexity should I target when dealing with "mc-flavtag-ttbar-small" dataset? And 2) what is the split of train/validate/test ratio there?