Comment by eurekin
19 hours ago
Batching lowers that, since the model is read once from memory. Activation accumulation doesn't scale as nicely
19 hours ago
Batching lowers that, since the model is read once from memory. Activation accumulation doesn't scale as nicely
No comments yet
Contribute on Hacker News ↗