Comment by rhdunn
3 days ago
Reading the post the architectural change is combining a vision model (Mistral 3 in the flux.2 case) with a rectified flow transformer.
I wonder if this architectural change makes it easier to use other vision models such as the ones in Llama 3 and 4, or possibly a future Llama 5.
No comments yet
Contribute on Hacker News ↗