Comment by rcarmo

10 months ago

Coincidentally, I just managed to get Qwen3 to go into a loop by using a fairly simple prompt:

"create a python decorator that uses a trie to do mqtt topic routing”

phi4-reasoning works, but I think the code is buggy

phi4-mini-reasoning freaks out

qwen3:30b starts looping and forgets about the decorator

mistral-small gets straight to the point and the code seems sane

https://mastodon.social/@rcarmo/114433075043021470

I regularly use Copilot models, and they can manage this without too many issues (Claude 3.7 and Gemini output usable code with tests), but local models seem to not have the ability to do it quite yet.

11 comments

rcarmo

datpuz 10 months ago

Here's qwen-30b-a3b's response to your prompt when I worded it better:

The prompt was:

"Create a Python decorator that registers functions as handlers for MQTT topic patterns (including + and # wildcards). Internally, use a trie to store the topic patterns and match incoming topic strings to the correct handlers. Provide an example showing how to register multiple handlers and dispatch a message to the correct one based on an incoming topic."

https://pastebin.com/wefw7X2h

rcarmo 10 months ago
I went back and used your prompt, and it is still looping:
https://pastebin.com/VfmhCTFm
- anon373839 10 months ago
  
  Are you using Ollama? If so, the issue may be Ollama's default context length: just 2,048 tokens. Ollama truncates the rest of the context silently, so "thinking" models cannot work with the default settings.
  If you are using Ollama, try explicitly setting the `num_ctx` parameter in your request to something higher like 16k or 32k, and then see if you still encounter the looping. I haven't run into that behavior once with this model.
  
  1 reply →

datpuz 10 months ago

I think your prompt is bad. Still impressive that Claude 3.7 handled your bad prompt, but qwen3 had no problem with this prompt:

Create a Python decorator that registers functions as handlers for MQTT topic patterns (including + and # wildcards). Internally, use a trie to store the topic patterns and match incoming topic strings to the correct handlers. Provide an example showing how to register multiple handlers and dispatch a message to the correct one based on an incoming topic.

rcarmo 10 months ago
I purposefully used exactly the same thing I did with Claude and Gemini to see how the models dealt with ambiguity. It shouldn't have degraded the chain of thought to the point where it starts looping.
- 101011 9 months ago
  
  The trick shouldn't be to try and generate a litmus test for agentic development, it's to change your workflow to game-plan solutions and decompose problems (like you would a jira epic to stories), and THEN have it build something for you.

avetiszakharyan 10 months ago

Is there an additional system prompt before that? Or i can repro with just this?

rcarmo 10 months ago

Just that. I purposefully used exactly the same thing I did with Claude and Gemini to see how the models dealt with ambiguity.

GaggiX 10 months ago

You should probably try a different quantization, have you try UD-Q4_K_XL?

throwaway314155 10 months ago

[flagged]