Comment by moffkalast

4 months ago

Obviously it's not going to be of a paid tier 2T sized SOTA model quality, but it can probably roughly match Haiku at the very least. And for tasks that aren't super complex that's already enough.

Personally though, I find Qwen useless for anything but coding tasks because if its insufferable sycophancy. It's like 4o dialed up to 20, every reply starts with "You are absolutely right" with zero self awareness. And for coding, only the best model available is usually sensible to use otherwise it's just wasted time.

19 comments

moffkalast

Anduia 4 months ago

That's why I start any prompt to Qwen 3.5 with:

persona: brief rude senior

amelius 4 months ago
I'm using:
persona: drunken sailor
Because then at least the tone matches the quality of the output and I'm reminded of what I can expect.
- moffkalast 4 months ago
  
  But then what do you do with it early in the morning?
  
  1 reply →
- dlcarrier 4 months ago
  
  Does it tend to break out into sea shanties?
  
  2 replies →
em500 4 months ago
This also works
persona: emotionless vulcan
- dlcarrier 4 months ago
  
  Does "persona: air traffic controller" work?
  If I could set up a voice assistant that actually verifies commands, instead of assuming it heard everything correctly 100% of the time, it might even be useful.
9wzYQbTYsAIc 4 months ago
persona: fair witness
https://fairwitness.bot/
- Chris2048 4 months ago
  
  You just paste in that YAML? Is this an official llm config format that is parsed out?
  
  1 reply →
ranger_danger 4 months ago

wow I had no idea you could do that. this changes everything for me.
varispeed 4 months ago

persona: party delegate in a rural province who doesn't want to be there
lemonginger 4 months ago

gamechanger

andai 4 months ago

>for coding, only the best model available is usually sensible to use otherwise it's just wasted time.

I had the opposite experience. Gave a small model and a big model the same 3 tasks. Small model was done in 30 sec. Large model took 90 sec 3x longer and cost 3x more. Depending on the task, the benchies just tell you how much you are over-paying and over-waiting.

wehadit 4 months ago

If you use the models like we execute coding tasks, older models outperform latest models. There's this prep tax that happens even before we start coding, i.e., extract requirements from tools, context from code, comments and decisions from conversations, ACs from Jira/Notion, stitch them together, design tailored coding standards and then code. If you automate the prep tax, the generated code is close to production ready code and may require 1-2 iterations max. I gave it a try and compared the results and found the output to be 92% accurate while same done on Claude Code gave 68% accuracy. Prep tax is the cue here

itsTyrion 4 months ago

oh? I used it in t3 chat before, with traits `concise` `avoid unnecessary flattery/affirmation/praise` `witty` `feel free to match potential user's sarcasm`

and it does use that sarcasm permission at times (I still dislike the way it generally communicates)

ggregoire 4 months ago

> I find Qwen useless for anything but coding tasks because if its insufferable sycophancy

We use Qwen at work since 2.0 for text/image/video analysis (summarization, categorization, NER, etc), I think it's impressive. We ask for JSON and always ask "do not explain your response".