Comment by embedding-shape

4 hours ago

> These aren't benchmark items with public answer keys — they're claims real users submitted for verification to a fact-checking platform.

Cool.

I wonder if anything of this matters when the authors don't disclose exactly how much of their report was written and made with LLMs in the first place? There even is a "11. Ethics & data use" section, and the research is about LLMs being infallible in some ways, yet the usage of LLMs for the production of this report isn't even mentioned once.

8 comments

embedding-shape

kostaj 4 hours ago

Data collection and processing was done manually. LLMs helped with the report drafting. Everything was human reviewed before publishing.

embedding-shape 4 hours ago
So it's not a secret, why you don't add this upfront to the report? The report itself is even about LLMs, makes a lot of sense to disclose your usage of them for writing the report, especially when you're presenting evidence that boils down to LLMs being infallible.
- rpdillon 2 hours ago
  
  I think you mean fallible.
  It's also a bit weird to "disclose use of LLMs". It rubs me wrong, the same way parents breathlessly talking about "screen time" rubbed me wrong: it's too general, and with such a broad brush, it's going to sweep up a bunch of perfectly fine usage with a bunch of dubious usage. On the flip side, if folks do start disclosing all the time, it's going to turn into a Prop 65 warnings in CA, where everything says it has lead in it, so folks pretty much ignore it and move on.
  If the report's conclusions and reasoning lean on LLMs, or if the data processing itself was done with LLMs, that would be interesting, and I wouldn't treat it as some sort of disclosure, but rather discuss it under methodology. Using LLMs to polish the language a bit after writing an initial draft with key findings? Much less interesting.
  I realize this is now a religious issue, and some folks are allergic to anything that touched an LLM. I just don't think that perspective is going to end up having a good shelf life.
- kostaj 4 hours ago
  
  It's an omission on my side. Will add in the next version.
  
  3 replies →
Aurornis 3 hours ago

> LLMs helped with the report drafting. Everything was human reviewed before publishing.
This is becoming the classic way of admitting an LLM wrote it.
Leaving that out of the report validated the complaint above.