Comment by NitpickLawyer
13 hours ago
Not necessarily. There were some tests last year-ish from hf that showed that simply alternating (randomly) between claude and gpt (whatever their versions were at the time) on a task produced better results than either of them individually. So during a task, the first call was sent to one, then the other and so on.
There's also the concept of "smart routing" requests based on some heuristics / embeddings. You'd get "simple" tasks handled by smaller (cheaper) models and use a bigger model to curate / sort / merge the results.
There's a lot of things to try here. I wouldn't personally pay for this service, but I don't think it's "a joke"...
See also: Agents built from alloys (July 2025)
https://news.ycombinator.com/item?id=44630724
They randomly alternated between frontier LLMs and got a massive boost to performance on cybersecurity tasks.