Comment by derbOac

2 days ago

Apparently we should hire the Guardian to evaluate LLM output accuracy?

Why are these products being put out there for these kinds of things with no attempt to quantify accuracy?

In many areas AI has become this toy that we use because it looks real enough.

It sometimes works for some things in math and science because we test its output, but overall you don't go to Gemini and it says "there's a 80% chance this is correct". At least then you could evaluate that claim.

There's a kind of task LLMs aren't well suited to because there's no intrinsic empirical verifiability, for lack of a better way of putting it.