Comment by datpuz
10 months ago
I think your prompt is bad. Still impressive that Claude 3.7 handled your bad prompt, but qwen3 had no problem with this prompt:
Create a Python decorator that registers functions as handlers for MQTT topic patterns (including + and # wildcards). Internally, use a trie to store the topic patterns and match incoming topic strings to the correct handlers. Provide an example showing how to register multiple handlers and dispatch a message to the correct one based on an incoming topic.
I purposefully used exactly the same thing I did with Claude and Gemini to see how the models dealt with ambiguity. It shouldn't have degraded the chain of thought to the point where it starts looping.
The trick shouldn't be to try and generate a litmus test for agentic development, it's to change your workflow to game-plan solutions and decompose problems (like you would a jira epic to stories), and THEN have it build something for you.