Comment by rcarmo

10 months ago

Coincidentally, I just managed to get Qwen3 to go into a loop by using a fairly simple prompt:

"create a python decorator that uses a trie to do mqtt topic routing”

phi4-reasoning works, but I think the code is buggy

phi4-mini-reasoning freaks out

qwen3:30b starts looping and forgets about the decorator

mistral-small gets straight to the point and the code seems sane

https://mastodon.social/@rcarmo/114433075043021470

I regularly use Copilot models, and they can manage this without too many issues (Claude 3.7 and Gemini output usable code with tests), but local models seem to not have the ability to do it quite yet.

Here's qwen-30b-a3b's response to your prompt when I worded it better:

The prompt was:

"Create a Python decorator that registers functions as handlers for MQTT topic patterns (including + and # wildcards). Internally, use a trie to store the topic patterns and match incoming topic strings to the correct handlers. Provide an example showing how to register multiple handlers and dispatch a message to the correct one based on an incoming topic."

https://pastebin.com/wefw7X2h

  • I went back and used your prompt, and it is still looping:

    https://pastebin.com/VfmhCTFm

    • Are you using Ollama? If so, the issue may be Ollama's default context length: just 2,048 tokens. Ollama truncates the rest of the context silently, so "thinking" models cannot work with the default settings.

      If you are using Ollama, try explicitly setting the `num_ctx` parameter in your request to something higher like 16k or 32k, and then see if you still encounter the looping. I haven't run into that behavior once with this model.

      1 reply →

I think your prompt is bad. Still impressive that Claude 3.7 handled your bad prompt, but qwen3 had no problem with this prompt:

Create a Python decorator that registers functions as handlers for MQTT topic patterns (including + and # wildcards). Internally, use a trie to store the topic patterns and match incoming topic strings to the correct handlers. Provide an example showing how to register multiple handlers and dispatch a message to the correct one based on an incoming topic.

  • I purposefully used exactly the same thing I did with Claude and Gemini to see how the models dealt with ambiguity. It shouldn't have degraded the chain of thought to the point where it starts looping.

    • The trick shouldn't be to try and generate a litmus test for agentic development, it's to change your workflow to game-plan solutions and decompose problems (like you would a jira epic to stories), and THEN have it build something for you.