Comment by porridgeraisin
7 days ago
Yep. Offline RL is especially full of these types of papers too. The sheer number of alternatives to the KL divergence to prevent the offline distribution from diverging too far from the collected data distribution... There's probably one method for each person on earth.
No comments yet
Contribute on Hacker News ↗