Comment by OtherShrezzing

8 months ago

If it proved correct, it'd be an important insight. If you can run three low-inference-cost models and get comparable performance to a single paid frontier model in agentic workflows, it suggests this is a general insight about the way model performance scales.

If your product is "good enough" with the current generation of models, you could cut OpenAI/Anthropic/Google out of the loop entirely by using open source & low-cost models.

I don't think an alloy can be as good as a larger model in general, though perhaps in special cases it can be.

Say that you want to translate a string from English to language X. Models A and B, having fewer parameters to spare, have less knowledge of language X. Model C, a larger model, has better knowledge of language X. No matter how A and B collude, they will not exceed the performance of model C.