Comment by unleaded
8 hours ago
Qwen3.6-35B-A3B-UD-Q4_K_M runs at about 11 tokens/second on my poor old 1060. Absolutely nuts how far we've come
8 hours ago
Qwen3.6-35B-A3B-UD-Q4_K_M runs at about 11 tokens/second on my poor old 1060. Absolutely nuts how far we've come
I tried running any model on my 1070 and it instantly crashes my old tower, probably time to get off windows and run linux on it.
Understated how much of a boon for Linux that AI development has been.
There isn’t any benefit to running a windows machine.
Au contraire, I run models on WSL and my desktop reliably wakes up from sleep. Best of both worlds.
Sounds like a hardware issue, though NVIDIA driver issues can't be ruled out, they're much rarer these days
Mind sharing your llama.cpp settings for that?
Using this llama.cpp fork https://github.com/TheTom/llama-cpp-turboquant and mostly copying from this video https://www.youtube.com/watch?v=8F_5pdcD3HY
Haven't had much time to test it other than asking a few questions & changing some HTML in cline so it might be thick as a brick for all I know, but still worth trying
I just tested it with some risc-v code and it wrote down a "mov" instruction several times.. yeah something needs tuning maybe