Comment by NitpickLawyer

13 hours ago

Not necessarily. There were some tests last year-ish from hf that showed that simply alternating (randomly) between claude and gpt (whatever their versions were at the time) on a task produced better results than either of them individually. So during a task, the first call was sent to one, then the other and so on.

There's also the concept of "smart routing" requests based on some heuristics / embeddings. You'd get "simple" tasks handled by smaller (cheaper) models and use a bigger model to curate / sort / merge the results.

There's a lot of things to try here. I wouldn't personally pay for this service, but I don't think it's "a joke"...

1 comment

NitpickLawyer

andai 4 hours ago

See also: Agents built from alloys (July 2025)

https://news.ycombinator.com/item?id=44630724

They randomly alternated between frontier LLMs and got a massive boost to performance on cybersecurity tasks.