Comment by EvgeniyZh

6 months ago

> Attempting to justify potential misconduct through semantic technicalities ("well, validation isn't technically training")

Validation is not training, period. I'll ask again: what is the possible goal of accessing the evaluation set if you don't plan to use it for anything except the final evaluation, which is what the test set is used for? Either they just asked for access without any intent to use the provided data in any way except for final evaluation, which can be done without access, or they did somehow utilize the provided data, whether by training on it (which they verbally promised not to), using it as a validation set, using it to create a similar training set, or something else.

> This directly contradicts established principles of scientific integrity where the spirit of agreements matters as much as their letter.

OpenAI is not doing science; they are doing business.

> This represents a stark logical reversal. The initial argument assumed benchmark manipulation would be meaningful enough to influence investors and industry perception. Now, when challenged, the same metrics are suddenly "meaningless." This is fundamentally inconsistent - either the metrics matter (in which case manipulation would be serious misconduct) or they don't (in which case there's no incentive to manipulate them).

The metrics matter to people, but this doesn't mean people can meaningfully predict the model's performance using them. If I were trying to describe each of your arguments as some demagogue technique (you're going to call it ad hominem or something, probably), then I'd say it's a false dichotomy: it can, in fact, be impossible to use metrics to predict performance precisely enough and for people to care about metrics simultaneously.

> The attempted simultaneous appeal to and dismissal of credentials

I'm not appealing to credentials. Based on what I wrote, you made a wrong guess about my credentials, and I pointed out your mistake.

> at this point, the argument OpenAI did something rests on unfalsifiable claims about the industry as a whole, claiming insider knowledge, while avoiding any verifiable evidence.

Your position, on the other hand, rests on the assumption that corporations behave ethically and with integrity beyond what is required by the law (and, specifically, their contracts with other entities).

> Validation is not training, period.

Sure, but what we care about isn't the semantics of the words, its the effects of what they're doing. Iterated validation plus humans doing hyperparameter tuning will go a long way towards making a model fit the data, even if you never technically run backprop with the validation set as input.

> OpenAI is not doing science; they are doing business.

Are you implying these are orthogonal? OpenAI is a business centered on an ML research lab, which does research, and which people in the research community have generally come to respect.

> at this point, the argument OpenAI did something rests on unfalsifiable claims about the industry as a whole, claiming insider knowledge, while avoiding any verifiable evidence.

No, it doesn't. What OP is doing is critiquing OpenAI for their misbehavior. This is one of the few levers we (who do not have ownership or a seat on their board) have to actually influence their future decisionmaking -- well-reasoned critiques can convince people here (including some people who decide whether their company uses ChatGPT vs. Gemini vs. Claude vs. ...) that ChatGPT is not as good as benchmarks might claim, which in effect makes it more expensive for OpenAI to condone this kind of misbehavior going forward.

The argument that "no companies are moral, so critiquing them is pointless" is just an indirect way of running cover for those same immoral companies.