Comment by steer_dev

2 months ago

You're 100% right. For a "judgment" task like "Does this patient have cancer?", the final acceptance criteria must be a human expert. A purely deterministic verifier is impossible.

My thesis is that even in those "fuzzy" workflows, the agent's process is full of small, deterministic sub-tasks that can and should be verified.

For example, before the AI even attempts to analyze the X-ray for cancer, it must: 1/ Verify it has the correct patient file (PatientIDVerifier). 2/ Verify the image is a chest X-ray and not a brain MRI (ModalityVerifier). 3/ Verify the date of the scan is within the relevant timeframe (DateVerifier).

These are "boring," deterministic checks. But a failure on any one of them makes the final "judgment" output completely useless.

steer isn't designed to automate the final, high-stakes judgment. It's designed to automate the pre-flight checklist, ensuring the agent has the correct, factually grounded information before it even begins the complex reasoning task. It's about reducing the "unforced errors" so the human expert can focus only on the truly hard part.

2 comments

steer_dev

malfist 2 months ago

Why do any of those checks with ai though? All of them you can get a less error prone answer without ai.

jennyholzer 2 months ago

Robo-eugenics is the best answer I can come up with