Comment by smokel

4 months ago

> This outperforms the majority of online llm services

I assume you mean outperforms in speed on the same model, not in usability compared to other more capable models.

(For those who are getting their hopes up on using local LLMs to be any replacement for Sonnet or Opus.)

21 comments

smokel

Obviously it's not going to be of a paid tier 2T sized SOTA model quality, but it can probably roughly match Haiku at the very least. And for tasks that aren't super complex that's already enough.

Personally though, I find Qwen useless for anything but coding tasks because if its insufferable sycophancy. It's like 4o dialed up to 20, every reply starts with "You are absolutely right" with zero self awareness. And for coding, only the best model available is usually sensible to use otherwise it's just wasted time.

Anduia 4 months ago
That's why I start any prompt to Qwen 3.5 with:
persona: brief rude senior
- amelius 4 months ago
  
  I'm using:
  persona: drunken sailor
  Because then at least the tone matches the quality of the output and I'm reminded of what I can expect.
  
  5 replies →
- em500 4 months ago
  
  This also works
  persona: emotionless vulcan
  
  1 reply →
- 9wzYQbTYsAIc 4 months ago
  
  persona: fair witness
  https://fairwitness.bot/
  
  2 replies →
- ranger_danger 4 months ago
  
  wow I had no idea you could do that. this changes everything for me.
- varispeed 4 months ago
  
  persona: party delegate in a rural province who doesn't want to be there
- lemonginger 4 months ago
  
  gamechanger
andai 4 months ago
>for coding, only the best model available is usually sensible to use otherwise it's just wasted time.
I had the opposite experience. Gave a small model and a big model the same 3 tasks. Small model was done in 30 sec. Large model took 90 sec 3x longer and cost 3x more. Depending on the task, the benchies just tell you how much you are over-paying and over-waiting.
- wehadit 4 months ago
  
  If you use the models like we execute coding tasks, older models outperform latest models. There's this prep tax that happens even before we start coding, i.e., extract requirements from tools, context from code, comments and decisions from conversations, ACs from Jira/Notion, stitch them together, design tailored coding standards and then code. If you automate the prep tax, the generated code is close to production ready code and may require 1-2 iterations max. I gave it a try and compared the results and found the output to be 92% accurate while same done on Claude Code gave 68% accuracy. Prep tax is the cue here
itsTyrion 4 months ago

oh? I used it in t3 chat before, with traits `concise` `avoid unnecessary flattery/affirmation/praise` `witty` `feel free to match potential user's sarcasm`
and it does use that sarcasm permission at times (I still dislike the way it generally communicates)
ggregoire 4 months ago

> I find Qwen useless for anything but coding tasks because if its insufferable sycophancy
We use Qwen at work since 2.0 for text/image/video analysis (summarization, categorization, NER, etc), I think it's impressive. We ask for JSON and always ask "do not explain your response".

segmondy 4 months ago

You can replace Sonnet and Opus with local models, you just need to run the larger ones.