Comment by Twirrim
6 days ago
I'm getting about 15-20 tok/s with a 128k context window using the Q3_K_S version.
For running the server:
$ ./llama.cpp/build/bin/llama-server --host 0.0.0.0 \
--port 8001 \
-hf unsloth/Qwen3.5-35B-A3B-GGUF:Q3_K_S \
--ctx-size 131072 \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--min-p 0.00
No comments yet
Contribute on Hacker News ↗