Comment by scuff3d
10 hours ago
How anybody can read stuff like this and still take all this seriously is beyond me. This is becoming the engineering equivalent of astrology.
10 hours ago
How anybody can read stuff like this and still take all this seriously is beyond me. This is becoming the engineering equivalent of astrology.
Anthropic recommends doing magic invocations: https://simonwillison.net/2025/Apr/19/claude-code-best-pract...
It's easy to know why they work. The magic invocation increases test-time compute (easy to verify yourself - try!). And an increase in test-time compute is demonstrated to increase answer correctness (see any benchmark).
It might surprise you to know that the only different between GPT 5.2-low and GPT 5.2-xhigh is one of these magic invocations. But that's not supposed to be public knowledge.
I think this was more of a thing on older models. Since I started using Opus 4.5 I have not felt the need to do this.
The evolution of software engineering is fascinating to me. We started by coding in thin wrappers over machine code and then moved on to higher-level abstractions. Now, we've reached the point where we discuss how we should talk to a mystical genie in a box.
I'm not being sarcastic. This is absolutely incredible.
And I've been had a long enough to go through that whole progression. Actually from the earlier step of writing machine code. It's been and continues to be a fun journey which is why I'm still working.
We have tests and benchmarks to measure it though.
Feel free to run your own tests and see if the magic phrases do or do not influence the output. Have it make a Todo webapp with and without those phrases and see what happens!
That's not how it works. It's not on everyone else to prove claims false, it's on you (or the people who argue any of this had a measurable impact) to prove it actually works. I've seen a bunch of articles like this, and more comments. Nobody I've ever seen has produced any kind of measurable metrics of quality based on one approach vs another. It's all just vibes.
Without something quantifiable it's not much better then someone who always wears the same jersey when their favorite team plays, and swears they play better because of it.
These coding agents are literally Language Models. The way you structure your prompting language affect the actual output.
If you read the transformer paper, or get any book on NLP, you will see that this is not magic incantation; it's purely the attention mechanism at work. Or you can just ask Gemini or Claude why these prompts work.
But I get the impression from your comment that you have a fixed idea, and you're not really interested in understanding how or why it works.
If you think like a hammer, everything will look like a nail.
6 replies →
Do you actively use LLMs to do semi-complex coding work? Because if not, it will sound mumbo-jumbo to you. Everyone else can nod along and read on, as they’ve experienced all of it first hand.
3 replies →