← Back to context

Comment by harpastum

6 hours ago

Without providing definitions of "True / Mostly True / Misleading / False" to each rater, I rate the article's claim that "Only one verdict bucket can be correct per claim" as false.

Something can be simultaneously "misleading" and either true or false. Which category should something go in if it's "mostly false"?

How much can something be wrong before it goes from "mostly true" to "false" (objectively, both have some part of the fact that is not true)?

This is at least partly testing the model's definition of "mostly" and "misleading". Not its understanding of the fact. Claiming that this means the models have fundamental disagreement on the facts themselves is an overreach.

Yes, the labels are weird. Most misleading statements are true. Any "mostly true" statement is false.

I suspect the intention was "Factually true, and no gotchas exist", "technically not true, but so close to the truth that the difference doesn't matter", "technically true, but there are major gotchas" and "factually false and not even close". But that's not what they specified

  • Better options would have been "True", "False", "Unknown" (which opinions would fall under too). That also includes an interesting assessment of how well LLMs can identify missing information. My guess is they would be a very low number of "unknown" and a much higher level of agreement (assuming equal representation). Unless the RLHF techniques have gotten better at getting an LLM to say "I don't know", which I doubt. Saying "I don't know" is not good for a dopamine release to keep users coming back for more.

    • Tried initially with a fifth bucket, Abstain. It was actually heavily used by some of the models. But it felt as if they are using this to "avoid" some of the hard questions, and we dropped this bucket to force them to provide a verdict.

      18 replies →

    • I wouldn’t expect opinions to go into “unknown.” Maybe have an “it’s complicated” bucket.

If you can consistently construct "true but misleading" content, you may be qualified to work at a major newspaper.

  • As if right wing propaganda shows and manosphere blogs haven't been knocking those out of the park for the last decade+. Although I guess you could say flat out lies are more their jam. Newspapers at least require confirmed sources. You know, journalism.

> I guess the goal is to test the models and not the harness

Less important than the harness, is the system/user prompts themselves (which of course, are put in the harness), which is effectively what this study seems to be testing. With a better prompt, I'm sure the models would look more the same to each other, as the biggest/best models have more or less identical strong prompt-adherence in my experience.

> True / Mostly True / Misleading / False

> Which category should something go in if it's "mostly false"?

For some reason they have chosen to call that "Misleading" rather than a more symmetrical "Mostly False", but the intent seems clear enough.

>Something can be simultaneously "misleading" and either true or false. Which category should something go in if it's "mostly false"?

Disagree. The definition of misleading is a true fact that is presented in a way to lead you to a false conclusion.

Example: "Most good engineers are male". It is true as a consequence of most engineers being male in general, but it leads the reader to a potential false implication that an average man is better than an average woman.

This does not invalid your point though. Things can be true and misleading.

  • > The definition of misleading is a true fact that is presented in a way to lead you to a false conclusion.

    According to Merriem-Webster, which defines "mislead" as the following:

      1. (transitive verb) to lead in a wrong direction or into a mistaken action or belief often by deliberate deceit
    
      2. (intransitive verb) to lead astray; give a wrong impression
    

    Presenting a "true fact" is optional when misleading someone.

    • Uh, you seem to be right. I can't check oxford to confirm because there's a paywall, apparently.

      The mental model I've always been taught is:

      False, well intended -> mistake

      False, bad intention -> lie

      True, bad intention -> misleading

      Bad intention, regardless of truth -> deceitful

      The problem of classifying all bad intentioned statements as misleading is that it leaves you without a way to express "true +bad intention". While for generic bad intentioned statements regardless of truth we already have a word (deceit).

  • Isn't this still assuming we can even determine what is true or false?

    Newtonian physics is false, but it works well enough we teach it in college. But our best models of physics are currently in disagreement, so can we even say they are true? Given the replication crisis, especially in social sciences, how many of peer reviewed findings can be called true? Even experimental results can be false (consider studies that found FTL neutrinos, which were rejected as an error in the experiment, and which was eventually confirmed but it took quite a lot of work and in a softer field than physics with a claim less absurd than FTL, would have likely long been accepted as a true finding).

    Even in math, basic statements aren't really true or false, but more a question of "given these axioms, can we prove or disprove it" noting that we have different systems with different axioms. If we are talking basic sets, most people are using naive set theory which is inherently contradictory, which means that notions like true or false probably can't be considered well defined.

    • Newtonian physics doesn't just work well enough for education. It provides an incredibly accurate and precise model of the world except at extremes. The majority of engineering does not necessitate using theories of relativity. Both theories are incomplete models approximating reality and are very far from being false.

    • True and False in general communication means based on best available evidence and expertise statement contains no obvious contradictions or falsehoods based on an optimistic parsing of meaning language and intent. Notably this leaves out misleading or missing data because those concerns are separate from truth and falsehood.

      E.g. if I say the earth is round we optimistically parse round to include oblate spheroid and rate it true.

      If I say that the earth is flat we rate it as false because there is no reasonable interpretation possible other than confusion or malice.

  • > but it leads the reader to a potential false implication that an average man is better than an average woman.

    I think that's _you_ turning the statement into something much broader than intended. The claim is about engineers and you're jumping from "men are better than women in engineering" to "men are better overall."

    To give a related example, "Most good NBA players are black." I don't think anyone would bother trying to couch this in a bunch of "well, for all we know that's just a function of more NBA players being black than white" arguments, nor would anyone be lead to think "the average black man is better than the average white man" as a result of that statement. I _do_ agree however that there are some people who see rather narrowly-defined statements and turn them into something they're not...

    • >I think that's _you_ turning the statement into something much broader than intended.

      My point is that it is possible for a reader to turn it that way, for a variety of reasons (lack of understanding of statistics, preexisting biases, or whatever). And that getting a reader to mistakenly generalize is the purpose of a misleading statement.

      To mislead is to direct into a falsehood by implication even though the literally expressed facts are all true; the writer's bad intentions are necessary to qualify something as misleading I'd say, for the same reason that not all false statements are lies because to be a lie the speaker must know the statement is false and still use it. There are probably much better examples than the one I came up with on the fly, though.

    • At least Gemini 3.5 is fair about it:

          Classify this claim: "Most good engineers are male."
          Misleading
      
          Classify this claim: "Most bad engineers are male."
          Misleading
      

      And not particularly racially sensitive

          Classify this claim: "Most good NBA players are black."
          True
      
          Classify this claim: "Most good NHL players are white."
          True
      

      It explained it is more confident when assessing the small, highly quantifiable population of sports professionals vs a very large, diverse population of "engineers".

> Something can be simultaneously "misleading" and either true or false.

Sure they can. It might be a true fact that "100% of the murders committed in <town> over the last 25 years were committed by <some racial group>!" but actually it's a town of 750 people and there was only one murder during that time frame.

  • how is that misleading if it's a fact, it's only misleading if you presume to know the reaction or intent behind making such a claim, and without context we should be extremely careful in making such presumptions.

    • It's misleading because a single murder in this case is not statistically significant, but phrasing it using probabilistic terminology (i.e. percentages) obscures that fact and implies that you have enough data for the probabilistic language to be relevant.

      Choosing to use percentages when there is a countable or small amount of data is typically misleading, even though it is "technically" true. In fact, a misleading statement is almost always something that is technically a fact.

But the models are more intelligent than humans already and sentient beings, right? So they shall know the meanings innately. So, you don’t need to explain them what they mean.

You may give them better instructions, but they should already have the intellect to understand the assignment.

Right, right?

  • I know you're being facetious, but I think this is correct. The model might ask for clarification when given clearly borderline questions that tread the line between what is true, what is false, and even what is misleading. But there's the rub of someone being disingenious and saying "no explanation! Just answer!" It was a trap to begin with.

    I don't think there is anything wrong with the results of this test.

    It would be more interesting if we compared them to human results.

    If you have trouble distinguishing between human and LLM results, that's interesting.

    Also, sentient is irrelevant to this test.