Comment by rstuart4133

21 hours ago

I get the impression the hive mind hasn't come to terms with the point that a model is optimised for certain tasks. It's like having someone ask you "is that a good hammer?". Good for what? There are claw hammers, sledgehammers, ball-peen hammers, club hammers, mallets, .... Yes, in a pinch, they can all bang in nails, but you wouldn't choose a dead blow hammer for that if you had a choice.

The Gemini Flash is very good at searches. Just about any low end model can toss out a poem. All the higher end models (open source and otherwise) seem to be able to churn out code that passes tests. The smaller, "less capable" ones are much faster at it, which means in the hands of a skilled practitioner are the best choice for that task. But they rapidly fall apart where there isn't a hard source of truth (like a good test suite) to grind against. Because of that you have to use a bigger model for bug finding. In that task the open source models tend to fail on larger code bases, where something like Opus still shines. I gather Mythos is an absolute monster, and unparalleled, and unavailable. I'm sure one of the reasons for that is it's so expensive to run.

Or to put it another way - you don't use a 100 tonne crane to pick up the shopping. And ... the smaller models will happily run on in-house hardware. You may not do it today because of the current DRAM price and integrated NPUs have just started shipping, but in 5 years time models will be running on your phone.

1 comment

rstuart4133

Npovview 3 hours ago

Yes exactly, we will have specialized models soon. These will be trained with plugin architecture with a core reasoning model asking plugin models to do stuff on its behalf. I don't need chinese or russian knowledge in my workflow.