Comment by vunderba
21 hours ago
I haven’t gotten around to adding Klein to my GenAI Showdown site yet, but if it’s anything like Z-Image Turbo, it should perform extremely well.
For reference, Z-Image Turbo scored 4 out of 15 points on GenAI Showdown. I’m aware that doesn’t sound like much, but given that one of the largest models, Flux.2 (32b), only managed to outscore ZiT (a 6b model) by a single point and is significantly heavier-weight, that’s still damn impressive.
Local model comparisons only:
Can you fix the information bubble on mobile please? When pressing one, it vanishes instantly...
Hey Bombthecat, sorry about that! I can't repro this issue on any of the devices I have (Android Pixel 7, an iPad, etc).
If you get a chance, could you list your mobile device specs? That way I can at least try it on Browserstack and see if I can figure out a fix.
Samsung, brave browser
Update: Huh, now it's working
Yeah works fine for me on a Pixel 9.
I think it shows problems with your tests tbh. The bigger models are way more capable than you make them out to be. They are also better in training and understanding of CGI render outputs as reference like normal maps or id-masks. Your testing suite is the perfect example that structured data implies false confidence. Pure t2i is not a good benchmark anymore.
Thanks for the feedback.
> The bigger models are way more capable than you make them out to be.
No test suite is ever going to be perfect. GenAI Showdown was started with the goal of focusing on a very narrow spectrum of testing (prompt adherence) because as a creator that's the one of the most interest to me.
> Pure t2i is not a good benchmark anymore
Just FYI Image Editing is already a separate benchmark (see the navbar at the top).
> Your testing suite is the perfect example that structured data implies false confidence
Again - the headline is "Specific prompts and challenges with a strong emphasis placed on adherence". If I tried to capture every possible aspect of GenAI models (multimodal, texture maps, periodic motion, tiling, etc) - I'd be at it until the heat death of the universe.
Incidentally - which model (specifically) do you think is ranked unfairly? While Flux.2 [dev] did only score a single point above ZiT, it's weighted score is much higher (1442 points vs 911 points).