Comment by konaraddi
1 day ago
> Taalas’ silicon Llama achieves 17K tokens/sec per user, nearly 10X faster than the current state of the art, while costing 20X less to build, and consuming 10X less power.
Insane gains, makes me excited for the future. Imagine Opus-like responses in <1 second.
I suspect power efficiency will be nearly entirely offset by increased usage but it’s more bang for watt.
No comments yet
Contribute on Hacker News ↗