Comment by orasis
2 days ago
It’s best for immediate rewards. If you have delayed rewards there is a paper on sampling from the “delay distribution” that solves this.
2 days ago
It’s best for immediate rewards. If you have delayed rewards there is a paper on sampling from the “delay distribution” that solves this.
No comments yet
Contribute on Hacker News ↗