← Back to context

Comment by tuhgdetzhh

3 months ago

The test is rigged because they used non thinking models.

7 comments

tuhgdetzhh

Reply

handoflixue 3 months ago

Testing some subset X does not mean the test is rigged unless they failed to disclose that.

But also:

GPT 5.2 Thinking, Standard Effort: Walk - https://chatgpt.com/share/699d38cb-e560-8012-8986-d27428de8a...

I'm assuming "GPT 5.2 Thinking" is, in fact, a thinking model?

randomtoast 3 months ago
The problem is you haven't used the API, but you have used your ChatGPT subscriptions with personality, memories and possible customization. I can see for instance that your ChatGPT answers with emojis, while my ChatGPT subscription never does.
If you ask GPT 5.2 with high reasoning efforts in the API, you get 10 out of 10: drive.
- handoflixue 3 months ago
  
  If it doesn't work at all using the most popular pricing plans (subscription), AND it doesn't work on the most popular way of accessing it (web), then it seems fair to say there's a problem.
  And the problem is NOT that I'm using a product in the advertised, intended way.

felix089 3 months ago

These are reasoning / thinking models

tuhgdetzhh 3 months ago
Source?
- tverbeure 3 months ago
  
  I don't know, but model names such as "kimi-k2-thinking" in the test set might offset a clue.
  
  1 reply →