Comment by InitialPhase55
20 hours ago
Curious, how did you settle on Haiku/Sonnet? Because there are much cheaper models on OpenRouter that probably perform comparatively...
Consider Haiku 4.5: $1/M input tokens | $5/M output tokens vs MiniMax M2.7: $0.30/M input tokens | $1.20/M output tokens vs Kimi K2.5: $0.45/M input tokens | $2.20/M output tokens
I haven't tried so I can't say for sure, but from personal experience, I think M2.7 and K2.5 can match Haiku and probably exceed it on most tasks, for much cheaper.
Since they're opening it publicly on irc here, the safety rails might be a consideration. I've made an agent recently and that's why I'm paying a premium to Anthropic atm -- Though I'm still experimenting to see if it's really necessary.
It's getting some organic usage -- 100M input tokens for just chats this month -- and I've seen enough users try to throw Haiku against the wall and failing to trick it into misbehaving. It "pumps the breaks" a lot and imitates annoyance when you ask it repeatedly :) Handles emotionally driven real-life questions mid-conversation well. It just works.
Not seeing all that consistently with other models I've tried so far -- but I've assumed it's not a completely fair comparison with (e.g.) open weights, since these safety rails are presumably not always arising from the natural model calls.
Good point! Didn't consider that aspect, agree.
Xiaomi Mimo v2-Flash is fantastic.
I have a relatively hard personal agentic benchmark, and Mimo v2-Flash scores 8% higher in 109 seconds for $0.003 (0.3 cents!) vs Haiku which took 262 seconds for $0.24 (24 cents)
Gemini 3.1 Flash Lite Preview (yes that is its name) is also a solid choice.
The gemini models are fantastic for price but the naming scheme is ridiculous, I have to triple check it every time.
MiniMax M2.7 is actually pretty solid. I’ve been using it for coding lately and it handles most tasks just fine, but Opus 4.6 is still on another level.
MiniMax's Token Plan is even less expensive and agent usage is explicitly allowed.
just use gemini flash3, it's better than haiku
unless gp really cares about lower hallucination rates
https://artificialanalysis.ai/?omniscience=omniscience-hallu...
or better yet 3.1 Flash-Lite at $0.25/1M input
Because this is probably paid marketing by Anthropic?