Comment by eurekin
15 hours ago
Batching lowers that, since the model is read once from memory. Activation accumulation doesn't scale as nicely
15 hours ago
Batching lowers that, since the model is read once from memory. Activation accumulation doesn't scale as nicely
No comments yet
Contribute on Hacker News ↗