Comment by unleaded

8 hours ago

  .\llama-server.exe -m ..\Qwen3.6-35B-A3B-UD-Q4_K_M.gguf -ngl 999 --n-cpu-moe 41 -c 262144 --port 8081 --flash-attn on --cache-type-k turbo4 --cache-type-v turbo3 --no-mmap --mlock --host 0.0.0.0 -t 8 -tb 8 -np 1

Using this llama.cpp fork https://github.com/TheTom/llama-cpp-turboquant and mostly copying from this video https://www.youtube.com/watch?v=8F_5pdcD3HY

Haven't had much time to test it other than asking a few questions & changing some HTML in cline so it might be thick as a brick for all I know, but still worth trying

1 comment

unleaded

unleaded 3 hours ago

I just tested it with some risc-v code and it wrote down a "mov" instruction several times.. yeah something needs tuning maybe