Comment by SillyUsername

3 months ago

o3 has been the worst model of the new 3 for me.

Ask it to create a Typescript server side hello world.

It produces a JS example.

Telling it that's incorrect (but no more detail) results in it iterating all sorts of mistakes.

In 20 iterations it never once asked me what was incorrect.

In contrast, o4-mini asked me after 5, o4-mini-high asked me after 1, but narrowed the question to "is it incorrect due to choice of runtime?" rather than "what's incorrect?"

I told it to "ask the right question" based on my statement ("it is incorrect") and it correctly asked "what is wrong with it?" before I pointed out no Typescript types.

This is the critical thinking we need not just reasoning (incorrectly).

7 comments

SillyUsername

echoangle 3 months ago

> Ask it to create a Typescript server side hello world. It produces a JS example.

Well TS is a strict superset of JS so it’s technically correct (which is the best kind of correct) to produce JS when asked for a TS version. So you’re the one that’s wrong.

pton_xd 3 months ago

> Well TS is a strict superset of JS so it’s technically correct (which is the best kind of correct) to produce JS when asked for a TS version. So you’re the one that’s wrong.
Try that one at your next standup and see how it goes over with the team
redox99 3 months ago
He's not wrong. If the model doesn't give you what you want, it's a worthless model. If the model is like the genie from the lamp, and gives you a shitty but technically correct answer, it's really bad.
- echoangle 3 months ago
  
  > If the model doesn't give you what you want, it's a worthless model.
  Yeah, if you’re into playing stupid mind games while not even being right.
  If you stick to just voicing your needs, it’s fine. And I don’t think the TS/JS story shows a lack of reasoning that would be relevant for other use cases.
  
  2 replies →
SillyUsername 3 months ago

Well yes, but still the name should give it away and you'll be shot during PRs if you submit JS as TS :D
The fact is the training data has confused JS with TS so the LLM can't "get its head" around the semantic, not technical difference.
Also the secondary point wasn't just that it was "incorrect" it's the fact its reasoning was worthless unless it knew who to ask and the right questions to ask.
If somebody says to you something you know is right, is actually wrong, the first thing you ask them is "why do you think that?" not "maybe I should think of this from a new angle, without evidence of what is wrong".
It illustrates lack of critical thinking, and also shows you missed the point of the question. :D