Comment by mountainriver
2 months ago
FlowRL is one, it’s learning the full distribution of rewards rather than just optimizing toward a single maximum
2 months ago
FlowRL is one, it’s learning the full distribution of rewards rather than just optimizing toward a single maximum
Thanks, that looks very promising!