Comment by nsingh2

14 hours ago

Why supply underspecified requirements in the first place? Both models are good at challenging assumptions/edge cases and asking questions to clarify, but seemingly only when explicitly asked (i.e. something like a "brainstorm" skill).

I don't think either harnesses do enough to encourage the model to challenge all assumptions and ask questions, maybe because users might find it annoying. That step is basically a requirement IMO.

I've found all of the GPT-5 models to be very nit-picky, useful for code review and mathematics (important for my work), but seemingly gets in the way of "aesthetic" code, e.g. overly defensive code to cover all edge cases, even if unlikely.

There is seemingly also a tradeoff between flexibility vs instruction following. In my experience Opus will sometimes ignore instructions but can "fill in the blanks" more, vs GPT-5.5 follows instructions better but perhaps at the cost of rigidity.

12 comments

nsingh2

fooker 14 hours ago

> Why supply underspecified requirements in the first place?

Because you'd not want to forever loop outside your home when asked to "while you're out, grab some eggs" :)

reactordev 9 hours ago

Meaning why not leave home with your grocery list?

iLoveOncall 12 hours ago

> Why supply underspecified requirements in the first place?

Because the entire reason we use LLMs is to supposedly improve productivity?

nsingh2 11 hours ago
Refusing to sufficiently specify a task and hoping the model guesses correctly is not being productive. Again, these models still don't really ask questions when they should. You have to explicitly tell them to.
Specifying the problem is not extra work separate from solving it. If you skip that step, the ambiguity gets pushed into the model’s assumptions. Then you get a plausible looking answer to the wrong problem and have to waste time backing out of it.
LLMs are not magic machines that can read your mind.
- iLoveOncall 11 hours ago
  
  My point is that it is much faster for me to solve the problem by writing the code than to write specifications detailed enough for the model to do the right thing in the right way.
  
  4 replies →

antonvs 14 hours ago

> Why supply underspecified requirements in the first place?

Minimizes effort, is the obvious answer.

cyberpunk 13 hours ago
Poor trade off, the model is then designing a massive chunk of your solution instead of you. With a good spec, bits of typo’d pseudocode, and slightly more effort than a couple of sentences they can actually produce passable software.
I think the reason claude has so much mindshare is exactly because it’s more useful to non-developers who wouldn’t know how to describe what an api call executes to his grandmother.
For those who can, I can’t find much of a difference between them. Codex has the slight edge, but that’s all just “feels” to me.
- ben_w 12 hours ago
  
  You call it a poor trade off, but:
  > I think the reason claude has so much mindshare is exactly because it’s more useful to non-developers who wouldn’t know how to describe what an api call executes to his grandmother.
  This is exactly the benefit for most people.
  Most people don't want to code the app, they just want the app.
  Even people like us who do like coding, we can only think of all of these things within a domain that we already know; somebody who writes shaders for games isn't likely to know or care much about the ins and outs of database development or how healthcare privacy law and KYC interact with zero-knowledge proofs.
  (Of course, if the AI knows about these things and then completely fails to make use of that knowlege, that's still a fail).