Comment by OsamaJaber
17 days ago
Small models in the browser are a different optimization problem than small models on a server. On server you chase throughput so you batch. In browser you're stuck at batch size 1, which means kernel launch overhead and memory bandwidth dominate, not FLOPs
No comments yet
Contribute on Hacker News ↗