← Back to context

Comment by danbruc

3 days ago

You can sample the result to determine the error rate, but if you find an unacceptable level of errors, then you still have to review everything manually. On the other hand, if you use traditional techniques, pattern matching with regular expressions and things like that, then you can probably get pretty close to perfection for those cases where your patterns match and you can just reject the rest for manual processing. Maybe you could ask a language model to compare the source document and the extracted data and to indicate whether there are errors, but I am not sure if that would help, maybe what tripped up the extraction would also trip up the result evaluation.