Comment by nl 5 days ago Taalas is interesting. 16,000 TPS for Llama on a chip.https://taalas.com/ 10 comments nl Reply micw 5 days ago On a very old model, it's more like 16.000 garbage words/s nl 5 days ago Llama 3.1 8B is pretty useful for some thing. I use it to generate SQL pretty reliably for example.They are doing an updated model in a month or so anyway, then a frontier level one "by summer". numeri 5 days ago but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all. patapong 5 days ago I do wonder if there are tasks where 16k garbage words/s are more useful than 200 good words per second. Does anyone have any ideas? Data extraction perhaps? pnocera 4 days ago A politician communication agent maybe... Nihilartikel 5 days ago Neat! I had been wondering if anyone was trying to implement a model in silico. We're getting closer to having chatty talking toasters every day now! empath75 5 days ago "What is my purpose..."https://www.youtube.com/watch?v=sa9MpLXuLs0 Nihilartikel 1 day ago Even more on the nose: https://youtu.be/LRq_SAuQDec?si=CAe210GZ_lKcc6_Y DeathArrow 5 days ago I wonder how many token per seconds can they get if they put Mercury 2 on a chip. replete 5 days ago Its exciting to see, but look at the die size for only an 8b model
micw 5 days ago On a very old model, it's more like 16.000 garbage words/s nl 5 days ago Llama 3.1 8B is pretty useful for some thing. I use it to generate SQL pretty reliably for example.They are doing an updated model in a month or so anyway, then a frontier level one "by summer". numeri 5 days ago but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all. patapong 5 days ago I do wonder if there are tasks where 16k garbage words/s are more useful than 200 good words per second. Does anyone have any ideas? Data extraction perhaps? pnocera 4 days ago A politician communication agent maybe...
nl 5 days ago Llama 3.1 8B is pretty useful for some thing. I use it to generate SQL pretty reliably for example.They are doing an updated model in a month or so anyway, then a frontier level one "by summer". numeri 5 days ago but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all.
numeri 5 days ago but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all.
patapong 5 days ago I do wonder if there are tasks where 16k garbage words/s are more useful than 200 good words per second. Does anyone have any ideas? Data extraction perhaps? pnocera 4 days ago A politician communication agent maybe...
Nihilartikel 5 days ago Neat! I had been wondering if anyone was trying to implement a model in silico. We're getting closer to having chatty talking toasters every day now! empath75 5 days ago "What is my purpose..."https://www.youtube.com/watch?v=sa9MpLXuLs0 Nihilartikel 1 day ago Even more on the nose: https://youtu.be/LRq_SAuQDec?si=CAe210GZ_lKcc6_Y
empath75 5 days ago "What is my purpose..."https://www.youtube.com/watch?v=sa9MpLXuLs0 Nihilartikel 1 day ago Even more on the nose: https://youtu.be/LRq_SAuQDec?si=CAe210GZ_lKcc6_Y
DeathArrow 5 days ago I wonder how many token per seconds can they get if they put Mercury 2 on a chip.
On a very old model, it's more like 16.000 garbage words/s
Llama 3.1 8B is pretty useful for some thing. I use it to generate SQL pretty reliably for example.
They are doing an updated model in a month or so anyway, then a frontier level one "by summer".
but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all.
I do wonder if there are tasks where 16k garbage words/s are more useful than 200 good words per second. Does anyone have any ideas? Data extraction perhaps?
A politician communication agent maybe...
Neat! I had been wondering if anyone was trying to implement a model in silico. We're getting closer to having chatty talking toasters every day now!
"What is my purpose..."
https://www.youtube.com/watch?v=sa9MpLXuLs0
Even more on the nose: https://youtu.be/LRq_SAuQDec?si=CAe210GZ_lKcc6_Y
I wonder how many token per seconds can they get if they put Mercury 2 on a chip.
Its exciting to see, but look at the die size for only an 8b model