← Back to context

Comment by refulgentis

6 months ago

> I'd argue here the more relevant point is "these specific people have been shown to have done it before."

This is itself a slippery move. A vague gesture at past misconduct without actually specifying any incidents. If there's a clear pattern of documented benchmark manipulation, name it. Which benchmarks? When? What was the evidence? Without specifics, this is just trading one form of handwaving ("everyone does it") for another ("they did it before").

> "AI" here is not being treated as an actual science would be.

There's some truth here but also some sleight of hand. Yes, AI development often moves outside traditional academic channels. But, you imply this automatically means less rigor, which doesn't follow. Many industry labs have internal review processes, replication requirements, and validation procedures that can be as or more stringent than academic peer review. The fact that something isn't in Nature doesn't automatically make it less rigorous.

> The majority of the papers pumped out of these places are not real concrete research, not submitted to journals, and not peer reviewed works.

This combines three questionable implications:

- That non-journal publications are automatically "not real concrete research" (tell that to physics/math arXiv)

- That peer review is binary - either traditional journal review or nothing (ignoring internal review processes, community peer review, public replications)

- That volume ("pumped out") correlates with quality

You're making a valid critique of AI's departure from traditional academic structures, but then making an unjustified leap to assuming this means no rigor at all. It's like saying because a restaurant isn't Michelin-starred, it must have no food safety standards.

This also ignores the massive reputational and financial stakes that create strong incentives for internal rigor. Major labs have to maintain credibility with:

- Their own employees.

- Other researchers who will try to replicate results.

- Partners integrating their technology.

- Investors doing technical due diligence.

- Regulators scrutinizing their claims.

The idea that they would casually risk all that just to bump up one benchmark number (but not too much! just from 10% to 35%) doesn't align with the actual incentive structure these organizations face.

Both the original comment and this fall into the same trap - mistaking cynicism for sophistication while actually displaying a somewhat superficial understanding of how modern AI research and development actually operates.

This reply reads as though it were AI generated.

Let's bite though, and hope that unhelpful excessively long-winded replies are just your quirk.

> This is itself a slippery move. A vague gesture at past misconduct without actually specifying any incidents. If there's a clear pattern of documented benchmark manipulation, name it. Which benchmarks? When? What was the evidence? Without specifics, this is just trading one form of handwaving ("everyone does it") for another ("they did it before").

Ok, provide specifics yourself then. Someone replied and pointed out that they have every incentive to cheat, and your response was:

> This starts with a fallacious appeal to cynicism combined with an unsubstantiated claim about widespread misconduct. The "everybody does it" argument is a classic rationalization that doesn't actually justify anything. It also misunderstands the reputational and technical stakes - major labs face intense scrutiny of their methods and results, and there's plenty of incestuous movement between labs and plenty of leaks.

Respond to the content of the argument -- be specific. WHY is OpenAI not incentivized to cheat on this benchmark? Why is a once-nonprofit which turned from releasing open and transparent models to a closed model and begun raking in tens of billions of investor cash not incentivized to continue to make those investors happy? Be specific. Because there's a clear pattern of corporate behaviour at OpenAI and associated entities which suggests your take is not, in fact, the simpler viewpoint.

> This combines three questionable implications: > - That non-journal publications are automatically "not real concrete research" (tell that to physics/math arXiv)

Yes, arXiv will host lots of stuff that isn't real concrete research. They've hosted April Fool's jokes, for example.[1]

> - That peer review is binary - either traditional journal review or nothing (ignoring internal review processes, community peer review, public replications)

This is a poor/incorrect reading of the language. You have inferred meaning that does not exist. If citations are so important here, cite a few dozen that are peer reviewed out of the hundreds.

> - That volume ("pumped out") correlates with quality

Incorrect reading again. Volume here correlates with marketing and hype. It could have an effect on quality but that wasn't the purpose behind the language.

> You're making a valid critique of AI's departure from traditional academic structures, but then making an unjustified leap to assuming this means no rigor at all. It's like saying because a restaurant isn't Michelin-starred, it must have no food safety standards.

Why is that unjustified? It's no different than any of the science background people who have fallen into flat earther beliefs. They may understand the methods but if they are not tested with rigor and have abandoned scientific principles they do not get to keep pretending it's as valid as actual science.

> This also ignores the massive reputational and financial stakes that create strong incentives for internal rigor. Major labs have to maintain credibility with:

FWIW, this regurgitated talking point is what makes me believe this is an LLM-generated reply. OpenAI is not a major research lab. They appear to essentially to be trading off the names of more respected institutions and mathematicians who came up with FrontierMath. The credibility damage here can be done by a single person sharing data with OpenAI, unbeknownst to individual participants.

Separately, even under correct conditions it's not as if there are not all manner of problems in science in terms of ethical review. See for example, [2].

[1] https://arxiv.org/abs/2003.13879 - FWIW, I'm not against scientists having fun, but it should be understood that arXiv is basically three steps above HN or reddit. [2] https://news.ycombinator.com/item?id=26887670

  • First paragraph is unnecessarily personal.

    It's also confusing: Did you think it was AI because of the "regurgitated talking point", as you say later, or because it was a "unhelpful excessively long-winded repl[y]"?

    I'll take the whole thing as an intemperate moment, and what was intended to be communicated was "I'd love to argue about this more, but can you cut down reply length?"

    > Ok, provide specifics yourself then.

    Pointing out "Everyone does $X" is fallacious does not imply you have to prove no one has any incentive to do $X. There's plenty of things you have an incentive to do that I trust you won't do. :)

    > If citations are so important here, cite a few dozen that are peer reviewed out of the hundreds.

    Sure.

    I got lost a bit, though, of what?

    Are you asking for a set of journal articles, that are peer-reviewed, about AI, that aren't on arxiv?

    > Why is that unjustified?

    "$X doesn't follow traditional academic structures" does not imply "$X has no rigor at all"

    > OpenAI is not a major research lab.

    Eep.

    > "all manner of problems in science in terms of ethical review. "

    Yup!

    The last 2 on my part are short because I'm not sure how to reply to "entity $A has short-term incentive to do thing $X, and entity $A is part of large group $B that sometimes does thing $X". We don't disagree there! I'm just applying symbolic logic to the rest. Ex. when I say "$X does not imply $Y" has a very definite field-specific meaning.

    It's fine to feel the way you do. It takes a rigorously rational process to end up making my argument, but rigorously is too kind: it would be crippling in daily life.

    A clear warning sign, for me, setting aside the personal attack opening, would have been when I was doing things like "arXiv has April Fool's Jokes!" -- I like to think I would have taken a step back after noticing it was "OpenAI is distantly related to group $X, a member of group $X did $Y, therefore let's assume OpenAI did $Y and conversate from there"