← Back to context

Comment by brookst

1 day ago

have you ever hired human evaluators at scale? They make all sorts of mistakes. Relatively low probability, so it’s a noise factor in, but I have yet to meet the human who is 100% accurate at simple tasks done thousands of times.

Which is why you hire them at scale as you say, then they are very reliable. LLM at scale are not.

The problem with these AI models is there is no such point where you can just scale them up and they can solve problems as accurately as a group of humans. They add too much noise and eventually go haywire when left to their own devices.

  • I haven’t found that to be the case. Both LLMs and humans produce outputs that cannot be blindly trusted to be accurate.