Comment by nekusar

2 months ago

Wellll, that rug aint gonna pull itself, now is it?

Ive been calling for local LLM as owning the means of production. I aint wrong.

2 comments

nekusar

Reply

ChildOfChaos 2 months ago

Not as simple as that. Everyone would happily use local, but the issue is local sucks.

nekusar 2 months ago

https://github.com/brontoguana/krasis
On my desktop RTX 5060 TI (16GB) and 96GB ram, I routinely get 25-30 tokens/sec using an 80B model quantized to int8. Uses 65GB system ram and 15GB gfx ram.
And its plenty fast for many of my purposes.
I could easily run a 30B model bf16 (full) and do like 50tok/s