Comment by meisel

4 months ago

Would it help a lot to run it through multiple different AI systems and verify that they agree on the result?

1 comment

meisel

Yeah that can occasionally work and something we've tested, but it introduces a lot of noise unfortunately and makes systematic evals difficult.