Comment by mountainriver
5 days ago
thats not a bad idea, very expensive though, and you end up with a pretty useless model in most regards.
A lot of the trusted benchmarks today are somewhat dynamic or have a hidden set.
5 days ago
thats not a bad idea, very expensive though, and you end up with a pretty useless model in most regards.
A lot of the trusted benchmarks today are somewhat dynamic or have a hidden set.
That could happen. One would need to risk it to take the approach. However, if it was trained on legal data, then there might be a market for it among those not risking copyright infringement. Think FairlyTrained.org.
"somewhat dynamic or have a hidden set"
Are there example inputs and outputs for the dynamic ones online? And are the hidden sets online? (I haven't looked at benchmark internals in a while.)