Comment by programjames
5 hours ago
1.0 is "natural units". If your energy corresponds to nats, you should be using temperature 1.0. If your energy corresponds to bits, you should be using temperature ln(2) ~= 0.7. The optimization pressure is
max nats = max entropy + energy / temperature
Why might energy correspond to bits or nats? Imagine your goal is to play as many interesting games of chess as possible in a tournament. This implies you have to keep winning. If you look at the RL environment from the right perspective, you can turn it into optimizing bits or nats.
No comments yet
Contribute on Hacker News ↗