Comment by Palmik
3 months ago
Inference for single example is memory bound. By doing batch inference, you can interleave computation with memory loads, without losing much speed (up until you cross the compute bound threshold).
3 months ago
Inference for single example is memory bound. By doing batch inference, you can interleave computation with memory loads, without losing much speed (up until you cross the compute bound threshold).
No comments yet
Contribute on Hacker News ↗