Comment by orasis
6 months ago
It’s best for immediate rewards. If you have delayed rewards there is a paper on sampling from the “delay distribution” that solves this.
6 months ago
It’s best for immediate rewards. If you have delayed rewards there is a paper on sampling from the “delay distribution” that solves this.
No comments yet
Contribute on Hacker News ↗