Comment by _davide_

9 days ago

I had a subscription before the price was cut down; the model kept randomly looping the with same character (burning 30% of the budget in one shot), and the overall performance for agentic purposes is, simply put, terrible. It finds non-existing bugs and randomly removes chunks of code to fix them, then even presents it as an "extra fix". Maybe it's a good generalistic model; I haven't tested it in that regard.

MiniMax (currently 2.7) which is a ~270B model tuned exclusively for agentic purposes, performs so MUCH better; it's more reliable and cheaper. Both are still far away from Opus 4.7 that I'm using at work. IMO benchmarks are just a very rough estimation; everyone cheats as much as they can get away with. Test the model yourself; do not make any assumptions based on the benchmarks.

I would love to see specialized, cheaper, bleeding-edge models like MiniMax for other non-agentic purposes as well. Why pay $1 for a general model when, for example, you can pay $0.1 for a content-moderator model that you actually need?

2 comments

_davide_

zarify 9 days ago

Funny, I had the opposite experience with MiniMax and Mimo when using OpenCode. MiniMax got stuck with looping through broken tool calls all the time and MiMo just powered through things and for the most part just worked.

shanoaice 6 days ago

similarly for me, MiniMax is kind of horrible that it somewhat regularly fall into loops that I had to save it from. DeepSeek & MiMO rarely got stuck. wonder how you get completely reversed experience.