Comment by efromvt

5 hours ago

I’d be interested in the benchmarking if you ever write it up! People do seem to assume LLM as a judge/panel improves outcomes (and arguably it does in cases like code review?) but I suspect it is very situational and the priors from human panel of experts don’t always translate cleanly.