Comment by AuryGlenz

23 days ago

Quality is increasing, but these small models have very little knowledge compared to their big brothers (Qwen Image/Full size Flux 2). As in characters, artists, specific items, etc.

Agreed - given what Tongyi-MAI Lab was able to accomplish with a 6b model - I would love to see what they could do with something larger. Somewhere in the range of 15-20b, between these smaller models (ZiT, Klein) and the significantly larger models (Flux.2 dev).

I smell the bias-variance tradeoff. By underfitting more, they get closer to the degenerate case of a model that only knows one perfect photo.

That's what LoRAs are for.

And small models are also much easier to fine tune than large ones.

  • I hate that excuse. I want the model to know who the Paw Patrol is without either finding a lora (which probably won't exist because they're mostly porn) or needing to make a dataset, tag it, and then train it myself.