Comment by InitialPhase55

20 hours ago

Curious, how did you settle on Haiku/Sonnet? Because there are much cheaper models on OpenRouter that probably perform comparatively...

Consider Haiku 4.5: $1/M input tokens | $5/M output tokens vs MiniMax M2.7: $0.30/M input tokens | $1.20/M output tokens vs Kimi K2.5: $0.45/M input tokens | $2.20/M output tokens

I haven't tried so I can't say for sure, but from personal experience, I think M2.7 and K2.5 can match Haiku and probably exceed it on most tasks, for much cheaper.

10 comments

InitialPhase55

lanyard-textile 12 hours ago

Since they're opening it publicly on irc here, the safety rails might be a consideration. I've made an agent recently and that's why I'm paying a premium to Anthropic atm -- Though I'm still experimenting to see if it's really necessary.

It's getting some organic usage -- 100M input tokens for just chats this month -- and I've seen enough users try to throw Haiku against the wall and failing to trick it into misbehaving. It "pumps the breaks" a lot and imitates annoyance when you ask it repeatedly :) Handles emotionally driven real-life questions mid-conversation well. It just works.

Not seeing all that consistently with other models I've tried so far -- but I've assumed it's not a completely fair comparison with (e.g.) open weights, since these safety rails are presumably not always arising from the natural model calls.

InitialPhase55 5 hours ago

Good point! Didn't consider that aspect, agree.

nl 16 hours ago

Xiaomi Mimo v2-Flash is fantastic.

I have a relatively hard personal agentic benchmark, and Mimo v2-Flash scores 8% higher in 109 seconds for $0.003 (0.3 cents!) vs Haiku which took 262 seconds for $0.24 (24 cents)

Gemini 3.1 Flash Lite Preview (yes that is its name) is also a solid choice.

efromvt 5 hours ago

The gemini models are fantastic for price but the naming scheme is ridiculous, I have to triple check it every time.

ruguo 18 hours ago

MiniMax M2.7 is actually pretty solid. I’ve been using it for coding lately and it handles most tasks just fine, but Opus 4.6 is still on another level.

jeremyjh 17 hours ago

MiniMax's Token Plan is even less expensive and agent usage is explicitly allowed.

faangguyindia 17 hours ago

just use gemini flash3, it's better than haiku

0123456789ABCDE 11 hours ago

unless gp really cares about lower hallucination rates
https://artificialanalysis.ai/?omniscience=omniscience-hallu...
attentive 16 hours ago

or better yet 3.1 Flash-Lite at $0.25/1M input

ls612 18 hours ago

Because this is probably paid marketing by Anthropic?