Comment by svieira

16 days ago

> [with] an accuracy rate of between 80-82% (humans were at 66%)

Was this human-verified in some way? If not, how did you establish the facts-on-the-ground about accuracy?

Yup, unfortunately the only way to know how good an AI is at anything is to do the same way you'd do with a human: build a test that you know the answers to already. That's also why the accuracy evaluation was by far the most time intensive part of the development pipeline as we had to manually build a "ground-truth" dataset that we could evaluate the AI again.