← Back to context

Comment by numba888

5 months ago

Not sure what they found. Either model is unable, or they were unable to solve the tasks using models. Loos like they used strait questions and not Chain of Thoughts. The result for the same model depends on how you ask. The tasks probably required more thinking under the hood than model is allowed to do in one request. More interesting would be if model is capable of solving given enough time. Using multiple requests orchestrated by some framework automatically.