Comment by cheald

2 days ago

Stable Diffusion 1.5 is a great model for hacking on. It's powerful enough that it encodes some really rich semantics, but small and light enough that iterative hacking on it is quick enough that it can be done by hobbyists.

I've got a new potential LoRA implementation that I've been testing locally (using a transformed S matrix with frozen U and V weights from an SVD decomposition of the base matrix) that seems to work really well, and I've been playing with both changes to the forward-noising schedule and the loss functions which seem to yield empirically superior results of the standard way of doing things. Epsilon prediction may be old and busted (and working on it makes me really appreciate flow matching!) but there's some really cool stuff happening in its training dynamics that are a lot of fun to explore.

It's just a lot of fun. Great playground for both learning how these things work and for trying out new ideas.

I’d love to follow your work. Got a GitHub?

  • I do (same username), but I haven't published any of this (and in fact my Github has sadly languished lately); I keep working on it with the intent to publish eventually. The big problem with models like this is that the training dynamics have so many degrees of freedom that every time I get close to something I want to publish I end up chasing down another set of rabbit holes.

    https://gist.github.com/cheald/7d9a436b3f23f27b8d543d805b77f... - here's a quick dump of my SVDLora module though. I wrote it for use in OneTrainer though it should be adaptable to other frameworks easily enough. If you want to try it out, I'd love to hear what you find.

    • This is super cool work. I’ve built some new sampling techniques for flow matching models that encourage the model to take a “second look” by rewinding sampling to a midpoint and then running the clock forward again. This worked really well with diffusion models (pre-DiT models like SDXL) and I was curious whether it would work with flow matching models like Qwen Image. Yes, it does, but the design is different because flow matching models aren’t de-noising pixels so much as they are simply following a vector field at each step like a ship being pushed by the wind.

      1 reply →