Comment by monkeydust
2 days ago
I have been thinking about this a bit - so rather than rely on one have an agentic setup that could take question run against the top 3 and then another one to judge the response to give back.
Is anyone doing this for high stake questions / research?
The argument against is that the models are fairly 'similar' as outlined in one of the awarded papers from Neurips '25 - https://neurips.cc/virtual/2025/loc/san-diego/poster/121421
I often put the models in direct conversation with each other to work out a framework or solution. It works pretty well, but they do tend to glaze each other a bit.