Comment by gnulinux

1 year ago

I'm curious if this would also improve small local models. E.g. if I "alloy" Qwen3-8B and OpenThinker-7B is it going to be "better" than each models? I'll try testing this in my M1 Pro.

8 comments

gnulinux

hobofan 1 year ago

Would it really matter? Normally you use those small local models because you don't have the memory to spare for a larger model, so the real question would be: Is an alloy of Qwen3-8B and OpenThinker-7B better than a Qwen3-15B?

Beyond a certain smallness threshold it might also work to constantly swap in the models in and out of memory, but doubt that's a great experience to build on top of.

OtherShrezzing 1 year ago
If it proved correct, it'd be an important insight. If you can run three low-inference-cost models and get comparable performance to a single paid frontier model in agentic workflows, it suggests this is a general insight about the way model performance scales.
If your product is "good enough" with the current generation of models, you could cut OpenAI/Anthropic/Google out of the loop entirely by using open source & low-cost models.
- zarzavat 1 year ago
  
  I don't think an alloy can be as good as a larger model in general, though perhaps in special cases it can be.
  Say that you want to translate a string from English to language X. Models A and B, having fewer parameters to spare, have less knowledge of language X. Model C, a larger model, has better knowledge of language X. No matter how A and B collude, they will not exceed the performance of model C.
gnulinux 1 year ago
Yes it would matter. If you just have budget to run a 8B model and it's sufficient for the easy problem you have, a better 8B model with the same spec requirements is necessarily better regardless of how it compares to some other model. I have tons of problems I throw a specific sized model at.
- hobofan 1 year ago
  
  > a better 8B model with the same spec requirements
  It's not the completely same spec requirements though. When using an alloy, you would need to have double the disk space (not a huge deal on desktop, but for mobile), significantly higher latency (as you need to swap the models in/out between every turn), and you can only apply it to multi-turn conversations/sufficiently decomposable problems.
Incipient 1 year ago
Haha every question involves multiple writes of 10gb to the disk. I think the cost of new SSDs would be less than getting more memory in the even short term.
- hobofan 1 year ago
  
  Were you replying to the right comment? (Though I also don't see another comment where what your are saying makes sense)

ls-a 1 year ago

If you do please report back