← Back to context

Comment by afro88

5 hours ago

I have a real world problem I gave o1 when it came out and it got it quite wrong. It's a scheduling problem with 4 different constraints that vary each day, and success criteria that need to be fulfilled over the whole week.

GPT-5 Thinking (Think Longer) and Opus 4.1 Extended Thinking both get it right.

Maybe this unique problem is somehow a part of synthetic training data? Or maybe it's not and the paper is wrong? Either way, we have models that are much more capable at solving unique problems today.