Comment by moffkalast
6 days ago
Obviously it's not going to be of a paid tier 2T sized SOTA model quality, but it can probably roughly match Haiku at the very least. And for tasks that aren't super complex that's already enough.
Personally though, I find Qwen useless for anything but coding tasks because if its insufferable sycophancy. It's like 4o dialed up to 20, every reply starts with "You are absolutely right" with zero self awareness. And for coding, only the best model available is usually sensible to use otherwise it's just wasted time.
That's why I start any prompt to Qwen 3.5 with:
persona: brief rude senior
I'm using:
persona: drunken sailor
Because then at least the tone matches the quality of the output and I'm reminded of what I can expect.
But then what do you do with it early in the morning?
1 reply →
Does it tend to break out into sea shanties?
2 replies →
This also works
persona: emotionless vulcan
Does "persona: air traffic controller" work?
If I could set up a voice assistant that actually verifies commands, instead of assuming it heard everything correctly 100% of the time, it might even be useful.
persona: fair witness
https://fairwitness.bot/
You just paste in that YAML? Is this an official llm config format that is parsed out?
1 reply →
wow I had no idea you could do that. this changes everything for me.
persona: party delegate in a rural province who doesn't want to be there
gamechanger
>for coding, only the best model available is usually sensible to use otherwise it's just wasted time.
I had the opposite experience. Gave a small model and a big model the same 3 tasks. Small model was done in 30 sec. Large model took 90 sec 3x longer and cost 3x more. Depending on the task, the benchies just tell you how much you are over-paying and over-waiting.
If you use the models like we execute coding tasks, older models outperform latest models. There's this prep tax that happens even before we start coding, i.e., extract requirements from tools, context from code, comments and decisions from conversations, ACs from Jira/Notion, stitch them together, design tailored coding standards and then code. If you automate the prep tax, the generated code is close to production ready code and may require 1-2 iterations max. I gave it a try and compared the results and found the output to be 92% accurate while same done on Claude Code gave 68% accuracy. Prep tax is the cue here
oh? I used it in t3 chat before, with traits `concise` `avoid unnecessary flattery/affirmation/praise` `witty` `feel free to match potential user's sarcasm`
and it does use that sarcasm permission at times (I still dislike the way it generally communicates)
> I find Qwen useless for anything but coding tasks because if its insufferable sycophancy
We use Qwen at work since 2.0 for text/image/video analysis (summarization, categorization, NER, etc), I think it's impressive. We ask for JSON and always ask "do not explain your response".