Comment by mountainriver
3 days ago
FlowRL is one, it’s learning the full distribution of rewards rather than just optimizing toward a single maximum
3 days ago
FlowRL is one, it’s learning the full distribution of rewards rather than just optimizing toward a single maximum
Thanks, that looks very promising!