Comment by layoric
4 days ago
Thanks for posting the performance numbers from your own validation. 6-7 tokens/sec is quite remarkable for the hardware.
4 days ago
Thanks for posting the performance numbers from your own validation. 6-7 tokens/sec is quite remarkable for the hardware.
Some more benchmarking, and with larger outputs (like writing an entire relatively complex TODO list app) it seems to go down to 4-6 tokens/s. Still impressive.
Decided to run an actual llama-bench run and let it go for the hour or two it needs. I'm posting my full results here (https://github.com/geerlingguy/ai-benchmarks/issues/47), but 8-10 t/s pp, and 7.99 t/s tg128, this is on a Pi 5 with no overclocking. Could probably increase the numbers slightly with an overclock.
You need to have a fan/heatsink to get that speed of course, it's maxing out the CPU for the entire time.