Comment by yorwba
6 hours ago
What is classic about "skip updating parameters with high gradient/loss variance in multiple batches/samples"? Do you have a particular algorithm in mind that uses this heuristic?
6 hours ago
What is classic about "skip updating parameters with high gradient/loss variance in multiple batches/samples"? Do you have a particular algorithm in mind that uses this heuristic?
No comments yet
Contribute on Hacker News ↗