Comment by tomekowal

11 hours ago

With qwen3.6-35b-a3b-mtp using lm-studio on RTX 3090, I was getting 120tokens/s. The mtp (multi token prediction) is the key.

I tired coding with Pi and it was much faster than Claude, but for any not-straightforward tasks, it did so so. Either looping itself or not realising easy to spot constraints.

But for exploring codebases and asking questions about big stuff I find it better due to sheer speed.

0 comments

tomekowal

No comments yet

Contribute on Hacker News ↗