← Back to context

Comment by trollbridge

5 days ago

The amount of effort to click an LLM’s sources is, what, 20 seconds? Was a human in the loop for sourcing that article at all?

Humans aren't very diligent in the long term. If an LLM does something correctly enough times in a row (or close enough), humans are likely to stop checking its work throughly enough.

This isn't exactly a new problem we do it with any bit of new software/hardware, not just LLMs. We check its work when it's new, and then tend to trust it over time as it proves itself.

But it seems to be hitting us worse with LLMs, as they are less consistent than previous software. And LLM hallucinations are partially dangerous, because they are often plausible enough to pass the sniff test. We just aren't used to handling something this unpredictable.

  • It’s a core part of the job and there’s simply no excuse for complacency.

    • This is a first degree expectation of most businesses.

      What the OP pointed out is a fact of life.

      We do many things to ensure that humans don’t get “routine fatigue”- like pointing at each item before a train leaves the station to ensure you don’t eyes glaze over during your safety check list.

      This isn’t an excuse for the behavior. Its more about what the problem is and what a corresponding fix should address.

      1 reply →

    • The words on the page are just a medium to sell ads. If shit gets ad views then producing shit is part of the job... unless you're the one stepping up to cut the checks.

      1 reply →

  • There's a weird inconsistency among the more pro-AI people that they expect this output to pass as human, but then don't give it the review that an outsourced human would get.

    • > but then don't give it the review that an outsourced human would get.

      Its like seeing a dog play basketball badly. You're too stunned to be like "no don't sign him to <home team>".

      1 reply →

  • The irony is that while from perfect, an LLM-based fact-checking agent is likely to be far more dilligent (but still needs human review as well) by nature of being trivial to ensure it has no memory of having done a long list of them (if you pass e.g. Claude a long list directly in the same context, it is prone to deciding the task is "tedious" and starting to take shortcuts).

    But at the same time, doing that makes it even more likely the human in the loop will get sloppy, because there'll be even fewer cases where their input is actually needed.

    I'm wondering if you need to start inserting intentional canaries to validate if humans are actually doing sufficiently torough reviews.

The kind of people to use LLM to write news article for them tend not to be the people who care about mundane things like reading sources or ensuring what they write has any resemblance to the truth.

The source would just be the article, which the Ars author used an LLM to avoid reading in the first place.

The problem is that the LLM's sources can be LLM generated. I was looking up some health question and tried clicking to see the source for one of the LLMs claim. The source was a blog post that contained an obvious hallucination or false elaboration.

If a human had enough time to check all the sources they wouldn't have been using an LLM to write for them.