Comment by smokel
6 days ago
> This outperforms the majority of online llm services
I assume you mean outperforms in speed on the same model, not in usability compared to other more capable models.
(For those who are getting their hopes up on using local LLMs to be any replacement for Sonnet or Opus.)
Obviously it's not going to be of a paid tier 2T sized SOTA model quality, but it can probably roughly match Haiku at the very least. And for tasks that aren't super complex that's already enough.
Personally though, I find Qwen useless for anything but coding tasks because if its insufferable sycophancy. It's like 4o dialed up to 20, every reply starts with "You are absolutely right" with zero self awareness. And for coding, only the best model available is usually sensible to use otherwise it's just wasted time.
That's why I start any prompt to Qwen 3.5 with:
persona: brief rude senior
I'm using:
persona: drunken sailor
Because then at least the tone matches the quality of the output and I'm reminded of what I can expect.
5 replies →
This also works
persona: emotionless vulcan
1 reply →
persona: fair witness
https://fairwitness.bot/
2 replies →
wow I had no idea you could do that. this changes everything for me.
persona: party delegate in a rural province who doesn't want to be there
gamechanger
>for coding, only the best model available is usually sensible to use otherwise it's just wasted time.
I had the opposite experience. Gave a small model and a big model the same 3 tasks. Small model was done in 30 sec. Large model took 90 sec 3x longer and cost 3x more. Depending on the task, the benchies just tell you how much you are over-paying and over-waiting.
If you use the models like we execute coding tasks, older models outperform latest models. There's this prep tax that happens even before we start coding, i.e., extract requirements from tools, context from code, comments and decisions from conversations, ACs from Jira/Notion, stitch them together, design tailored coding standards and then code. If you automate the prep tax, the generated code is close to production ready code and may require 1-2 iterations max. I gave it a try and compared the results and found the output to be 92% accurate while same done on Claude Code gave 68% accuracy. Prep tax is the cue here
oh? I used it in t3 chat before, with traits `concise` `avoid unnecessary flattery/affirmation/praise` `witty` `feel free to match potential user's sarcasm`
and it does use that sarcasm permission at times (I still dislike the way it generally communicates)
> I find Qwen useless for anything but coding tasks because if its insufferable sycophancy
We use Qwen at work since 2.0 for text/image/video analysis (summarization, categorization, NER, etc), I think it's impressive. We ask for JSON and always ask "do not explain your response".
You can replace Sonnet and Opus with local models, you just need to run the larger ones.