Comment by imjonse

4 months ago

Is it established whether GRPO is essential for this to work as it does, or could other RLHF-class methods provide similar results? My initial (possibly mistaken) impression was that GRPO was one of ways of mitigating the lack of enormous hardware resources.

1 comment

imjonse

danielhanchen 4 months ago

Yep so GRPO is much more memory efficient than PPO, but other RL type algorithms can work fine as well!