Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by storus

3 days ago

RL is extremely brittle, it's often difficult to make it converge. Even Stanford folks admit that. Are there any solutions for this?

2 comments

storus

Reply

mountainriver  3 days ago

FlowRL is one, it’s learning the full distribution of rewards rather than just optimizing toward a single maximum

  • storus  3 days ago

    Thanks, that looks very promising!

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities