Comment by modeless
6 days ago
It's only a "debacle" if you already assume OpenAI isn't trustworthy, because they said they don't train on the test set. I hope you can see that presenting your belief that they lied about training on the test set as evidence of them being untrustworthy is a circular argument. You're assuming the thing you're trying to prove.
The one OpenAI "scandal" that I did agree with was the thing where they threatened to cancel people's vested equity if they didn't sign a non-disparagement agreement. They did apologize for that one and make changes. But it doesn't have a lot to do with their research claims.
I'm open to actual evidence that OpenAI's research claims are untrustworthy, but again, I also judge people individually, not just by the organization they belong to.
They funded the entire benchmark and didn’t disclose their involvement. They then proceeded to make use of the benchmark while pretending like they weren’t affiliated with EpochAI. That’s a huge omission and more than enough reason to distrust their claims.
IMO their involvement is only an issue if they gained an advantage on the benchmark by it. If they didn't train on the test set then their gained advantage is minimal and I don't see a big problem with it nor do I see an obligation to disclose. Especially since there is a hold-out set that OpenAI doesn't have access to, which can detect any malfeasance.
It's typically difficult to find direct evidence for bias. That is why rules for conflict of interest and disclosure are strict in research and academia. Crucially, something is a conflict of interest if it could be perceived as a conflict of interest by someone external, so it doesn't matter if you think you could judge fairly, it's important if someone else might doubt you could.
Not disclosing a conflict of interest is generally considered a significant ethics violation, because it reduces trust in the general scientific/research system. Thus OpenAI has become untrustworthy in many people's view irrespective if their involvement with the benchmarks creation affected their results or not.
There’s no way to figure out whether they gained an advantage. We have to trust their claims, which again, is an issue for me after finding out they already lied.
2 replies →