Comment by eurekin

19 hours ago

Batching lowers that, since the model is read once from memory. Activation accumulation doesn't scale as nicely

0 comments