Comment by anticensor
6 days ago
> if you could do this automatically, it would be game changer as you could run top 5 best models in parallel and select best answer every time
remember they have access to the RLHF reward model, against which they can evaluate all N outputs and have the most "rewarded" answer picked and sent
No comments yet
Contribute on Hacker News ↗