Comment by Nevermark
1 hour ago
Here is one: An adjustment to weight updates, that makes it more likely for weights to stay uniformly distributed.
~257.5 teraflops for normal distribution, versus ~268 teraflops uniform, reported on the first graph.
I would have liked to see a straight graph of performance vs. clock speed, for each type of data. Pick your data statistics, then pick the peak performance clock speed accordingly.
And for actual runs, from a pre-run sampled curve.
No comments yet
Contribute on Hacker News ↗