Comment by codezero

1 day ago

I am amazed, though not entirely surprised, that these models keep getting smaller while the quality and effectiveness increases. z image turbo is wild, I'm looking forward to trying this one out.

An older thread on this has a lot of comments: https://news.ycombinator.com/item?id=46046916

There are probably some more subtle tipping points that small models hit too. One of the challenges of a 100GB model is that there is non-trivial difficulty in downloading and running the thing that a 4GB model doesn't face. At 4GB I think it might be reasonable to assume that most devs can just try it and see what it does.

Quality is increasing, but these small models have very little knowledge compared to their big brothers (Qwen Image/Full size Flux 2). As in characters, artists, specific items, etc.

  • Agreed - given what Tongyi-MAI Lab was able to accomplish with a 6b model - I would love to see what they could do with something larger. Somewhere in the range of 15-20b, between these smaller models (ZiT, Klein) and the significantly larger models (Flux.2 dev).

  • I smell the bias-variance tradeoff. By underfitting more, they get closer to the degenerate case of a model that only knows one perfect photo.

Is there a theoritical minimum for params for a given output? I saw news about GPT 3.5, then Deepseek training models at a fraction of that cost, then laptops running a model that beats 3.5. When does it stop?