Comment by card_zero

17 hours ago

Gee, a thing by a guy, with a name. What are you saying exactly? So the test in question is a test the LLM is asked to carry out, right? Then your point is that if it's a load of vacuous flannel 49% of the time, but meaningful 51% of the time, on average this is genuine work so we can't complain about the 49%?

Wait, you're probably talking about the test of discarding a report based on something superficial like spelling errors. Which fails with LLMs due to their basic conman personalities and smooth talking. And therefore ..?

0 comments

card_zero

No comments yet

Contribute on Hacker News ↗