Comment by philipp-gayret

21 days ago

Nice work! I've worked through all kinds of local models, very extensively for a week on an NVidia Spark. Gemma and Qwen, quantized, somewhat shine but the results overall compared to say a Claude Haiku were so disappointing (in context of tool calling) that I ended up returning the hardware. I'm curious how the same local models and benchmarks I have will hold up, will try this.

1 comment

philipp-gayret

zambelli 21 days ago

Good luck! Frontier models are called frontier for a reason. I've seen Forge get local models close to frontier on these evals, even beat it in some cases, but frontier still has an edge overall - no denying it.

The key I think is to look at what use cases you have that aren't big monsters. Auditing logs, home assistant, reading and summarizing news rss feeds, etc...stuff that's fairly bite-sized per task, but high volume. Then the local models make sense and they just need mechanical reliability to close the gap.