Comment by tomekowal
11 hours ago
With qwen3.6-35b-a3b-mtp using lm-studio on RTX 3090, I was getting 120tokens/s. The mtp (multi token prediction) is the key.
I tired coding with Pi and it was much faster than Claude, but for any not-straightforward tasks, it did so so. Either looping itself or not realising easy to spot constraints.
But for exploring codebases and asking questions about big stuff I find it better due to sheer speed.
No comments yet
Contribute on Hacker News ↗