Comment by bandrami
4 hours ago
"Do these documents contain models or descriptions of (list of devices redacted for HN), or personally identifying information?" would be a great question to be able to automate since it sucks up a lot of time that could be more profitably spent doing other things. There's costs to both Type I and Type II errors so deterministic filters only get us so far (which isn't very).
If it was incorrect 10% of the time would it be of help still?
Our pre-LLM system does better than that, but any improvement would help us do more lucrative things with our labor hours
I am left wondering if it is such a critical task, how even 1% error rate would reduce human review of all outputs.