Comment by nl

5 days ago

Taalas is interesting. 16,000 TPS for Llama on a chip.

https://taalas.com/

On a very old model, it's more like 16.000 garbage words/s

  • Llama 3.1 8B is pretty useful for some thing. I use it to generate SQL pretty reliably for example.

    They are doing an updated model in a month or so anyway, then a frontier level one "by summer".

    • but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all.

  • I do wonder if there are tasks where 16k garbage words/s are more useful than 200 good words per second. Does anyone have any ideas? Data extraction perhaps?