Comment by glorpsicle
3 days ago
Perhaps there's still value in the documents being transformed by this tool and someone reviewing them manually, but obviously the real value would be in reducing manual review. I don't think there's a world–for now–in which this manual review can be completely eliminated.
However, if you process, say, 1 million documents, you could sample and review a small percentage of them manually (a power calculation would help here). Assuming your random sample models the "distribution" (which may be tough to define/summarize) of the 1 million documents, you could then extrapolate your accuracy onto the larger set of documents without having to review each and every one.
You can sample the result to determine the error rate, but if you find an unacceptable level of errors, then you still have to review everything manually. On the other hand, if you use traditional techniques, pattern matching with regular expressions and things like that, then you can probably get pretty close to perfection for those cases where your patterns match and you can just reject the rest for manual processing. Maybe you could ask a language model to compare the source document and the extracted data and to indicate whether there are errors, but I am not sure if that would help, maybe what tripped up the extraction would also trip up the result evaluation.