Comment by refulgentis

3 days ago

I’ll go ahead and say they’re wrong (source: building and maintaining llm client with llama.cpp integrated & 40+ 3p models via http)

I desperately want there to be differentiation. Reality has shown over and over again it doesn’t matter. Even if you do same query across X models and then some form of consensus, the improvements on benchmarks are marginal and UX is worse (more time, more expensive, final answer is muddied and bound by the quality of the best model)

3 comments

refulgentis

stephenbez 1 day ago

Thanks. Are there any links where I can learn more about this?

I did some Googling and it appears that there are some examples where people say combining multiple models or multiple runs of the same models leads to improvements: https://www.sciencedirect.com/science/article/abs/pii/S00104... https://arxiv.org/abs/2203.11171

But presumably people are less likely to publish a paper when an approach doesn’t work.

IanCal 2 days ago

Are you saying I’m wrong that some models are better for some tasks than others, but there isn’t a universally best model for all tasks?

SignalStackDev 2 days ago

[dead]