Comment by mrkn1
3 days ago
small, ONNX-optimized models designed specifically for low-latency CPU streaming, so it avoids overhead of large transformer arch and GPU memory transfers
3 days ago
small, ONNX-optimized models designed specifically for low-latency CPU streaming, so it avoids overhead of large transformer arch and GPU memory transfers
No comments yet
Contribute on Hacker News ↗