Comment by razster
2 hours ago
Thanks for sharing that. I have the same card but 96gb ram. I use PI.dev to connect to LM-Studio. I may have to switch away from LM-studio if I can improve token speed. I think I range from 32-40t/s. qwen3.6-35b-a3b-genesis-v2-apex-mtp.
You can try tweaking MoE offload, I found the sweet spot after a few tries and even changing it by 1 can reduce speed by a few tok/s. I think around 45 is the average I get but sometimes it'll hit 50.