← Back to context

Comment by black_knight

5 days ago

Still does not crack my hardest nuts. Gave it one of them and it blew through my entire allowance on thinking about one question, with no apparent answer in sight!

I see a lot of people saying they are happy with weaker models, but I am the opposite, I need more strength, more intelligence!

I am quite happy that opus 4.8 can do some medium intelligence problems. And maybe Fable 5 can do some more more of those! I have a lot of problems to solve!

I also see a lot of people saying they are happy with weaker models.

At work I had to switch to using GPT 5.4 Mini and Qwen 3.6 27B.

The results were near useless.

The error rate is through the roof, it's constantly incorrect in its conclusions even when investigating very simple issues.

Further the models are too unreliable to even move 20 line snippets around without inadvertently modifying them. Ask them to correct it and they still get it wrong.

Maybe the larger Chinese models are better, but the Mini stuff is next to useless to me.

  • I have Qwen 3.6 27B and 35B running locally and and coming from Opus it feels like talking to an imposter. Someone who pretends to be competent, but really isn’t. Results are always disappointing. Sonnet is better, but I have given up on asking it. even for simple things I wait for my opus limits to reset.

What kind of problems are you trying to have it solve ?

  • The Riemann hypothesis, PvNP, and the Collatz conjecture.

    • Not these. I wonder if the well is poisoned there. The models know that these are "unpossible", so it might not solve them just because… Maybe some day.

      I am just testing it on stuff I know intimately myself. I would probably not understand a proof of Collatz if it was dansing in front of me!

      6 replies →

    • > The Riemann hypothesis, PvNP, and the Collatz conjecture.

      Did you add "make no mistake" to your prompt?

    • is this a joke? Seriously? These are some of hardest problems in Math period. 100 if not thousands of the greates minds in history have attempted to solve these problems. And you think that the current level of AI can blow through them? It is also a possibility that for example the Riemann Hypothesis is just not provable. (Goedels Theorem).

      4 replies →

  • The medium ones are results where one needs to construct some object, which my intuition tells me should exist. The difficult ones are typically to show that certain objects can not be constructed.

    These are not Fields medal type problems, nor know difficult/open conjectures. Just small stuff I have collected in my todo list over the years.

    • I have some medium difficulty math problems where I have used the models for the last year and a half repeatedly. Back then they were already good at pointing out obstructions and constructing counterexamples. So that tracks. But at first glance it looks like Fable actually made real progress on one problem for the first time.

      A year ago my judgement was that I had wasted my time on trying to work with the models and doing things myself would have been more productive as I would have gained intuition from the failures. Now it definitely seems to have figured out stuff that would have taken me more time than I have to spare on this problem...

      1 reply →