← Back to context Comment by hamiecod 7 days ago Thats a strong RL technique that could equal the quality of RLHF. 0 comments hamiecod Reply No comments yet Contribute on Hacker News ↗
No comments yet
Contribute on Hacker News ↗