← Back to context

Comment by sdrinf

9 hours ago

Just want to echo the recommendation for qwen3.5:9b. This is a smol, thinking, agentic tool-using, text-image multimodal creature, with very good internal chains of thought. CoT can be sometimes excessive, but it leads to very stable decision-making process, even across very large contexts -something we haven't seen models of this size before.

What's also new here, is VRAM-context size trade-off: for 25% of it's attention network, they use the regular KV cache for global coherency, but for 75% they use a new KV cache with linear(!!!!) memory-token-context size expansion! which means, eg ~100K token -> 1.5gb VRAM use -meaning for the first time you can do extremely long conversations / document processing with eg a 3060.

Strong, strong recommend.

I've been building a harness for qwen3.5:9b lately (to better understand how to create agentic tools/have fun) and I'm not going to use it instead of Opus 4.6 for my day job but it's remarkably useful for small tasks. And more than snappy enough on my equipment. It's a fun model to experiment with. I was previously using an old model from Meta and the contrast in capability is pretty crazy.

I like the idea of finding practical uses for it, but so far haven't managed to be creative enough. I'm so accustomed to using these things for programming.

  • What kind of small tasks do you find it's good at? My non-coding use of agents has been related to server admin, and my local-llm use-case is for 24/7 tasks that would be cost-prohibitive. So my best guess for this would be monitoring logs, security cameras, and general home automation tasks.

    • That's about it. The harness is still pretty rudimentary so I'm sure the system could be more capable, and that might reveal more interesting opportunities. I don't really know.

      So far I've got it orchestrating a few instances to dig through logs, local emails, git repositories, and github to figure out what I've been doing and what I need to do. Opus is waayyy better at it, but Qwen does a good enough job to actually be useful.

      I tried having it parse orders in emails and create a CSV of expenses, and that went pretty badly. I'm not sure why. The CSV was invalid and full of bunk entries by the end, almost every time. It missed a lot of expenses. It would parse out only 5 or 6 items of 7, for example. Opus and Sonnet do spectacular jobs on tasks like this, and do cool things like create lists of emails with orders then systematically ensure each line item within each email is accounted for, even without prompting to do so. It's an entirely different category of performance.

      Automation is something I'd like to dabble in next, but all I can think of it being useful for is mapping commands (probably from voice) to tool calls, and the reality is I'd rather tap a button on my phone. My family might like being able to use voice commands, though. Otherwise, having it parse logs to determine how to act based on thresholds or something would also be far better implemented with simple algorithms. It's hard to find truly useful and clear fits for LLMs

      1 reply →

You can really see the limitations of qwen3.5:9b in reasoning traces- it’s fascinating. When a question “goes bad”, sometimes the thinking tokens are WILD - it’s like watching the Poirot after a head injury.

Example: “what is the air speed velocity of a swallow?” - qwen knew it was a Monty Python gag, but couldnt and didnt figure out which one.

How much difference are you seeing between standard and Q4 versions in terms of degradation, and is it constant across tasks or more noticeable in some vs others?

Correction: not thinking, not a creature.

If it was a creature I would feel some sorrow when I killed it.

If you are feeling sorrow when you reboot a machine running an LLM, get to a psychiatrist ASAP.

  • Do you also require computers to grow legs when they "run"?

    "Thinking" is just a term to describe a process in generative AI where you generate additional tokens in a manner similar to thinking a problem through. It's kind of a tired point to argue against the verb since it's meaning is well understood at this point

    • I am a professional in the information technology field, which is to say a pedantic extremist who believes that words have meanings derived from consensus, and when people alter the meanings, they alter what they believe.

      Using "thinking", "feeling", "alive", or otherwise referring to a current generation LLM as a creature is a mistake which encourages being wrong in further thinking about them.

      6 replies →

  • Rebooting a machine running an LLM isn’t noticed by the LLM.

    Would you feel comfortable digitally torturing it? Giving it a persona and telling it terrible things? Acts of violence against its persona?

    I’m not confident it’s not “feeling” in a way.

    Yes its circuitry is ones and zeros, we understand the mechanics. But at some point, there’s mechanics and meat circuitry behind our thoughts and feelings too.

    It is hubris to confidently state that this is not a form of consciousness.

    • I'm not entirely opposed to the kind of animism that assigns a certain amount of soul, consciousness, or being to everything in a spectrum between a rock and a philosopher... but even so.

      Multiplying large matrices over and over is very much towards the "rock" end of that scale.

      2 replies →