Comment by ttul

3 months ago

This is a cool result. Deep learning image models are trained on enormous amounts of data and the information recorded in their weights continues to astonish me. Over in the Stable Diffusion space, hobbyists (as opposed to professional researchers) are continuing to find new ways to squeeze intelligence out of models that were trained in 2022 and are considerably out of date compared with the latest “flow matching” models like Qwen Image and Flux.

Makes you wonder what intelligence is lurking in a 10T parameter model like Gemini 3 that we may not discover for some years yet…

9 comments

ttul

cheald 3 months ago

Stable Diffusion 1.5 is a great model for hacking on. It's powerful enough that it encodes some really rich semantics, but small and light enough that iterative hacking on it is quick enough that it can be done by hobbyists.

I've got a new potential LoRA implementation that I've been testing locally (using a transformed S matrix with frozen U and V weights from an SVD decomposition of the base matrix) that seems to work really well, and I've been playing with both changes to the forward-noising schedule and the loss functions which seem to yield empirically superior results of the standard way of doing things. Epsilon prediction may be old and busted (and working on it makes me really appreciate flow matching!) but there's some really cool stuff happening in its training dynamics that are a lot of fun to explore.

It's just a lot of fun. Great playground for both learning how these things work and for trying out new ideas.

throwaway314155 3 months ago
I’d love to follow your work. Got a GitHub?
- cheald 3 months ago
  
  I do (same username), but I haven't published any of this (and in fact my Github has sadly languished lately); I keep working on it with the intent to publish eventually. The big problem with models like this is that the training dynamics have so many degrees of freedom that every time I get close to something I want to publish I end up chasing down another set of rabbit holes.
  https://gist.github.com/cheald/7d9a436b3f23f27b8d543d805b77f... - here's a quick dump of my SVDLora module though. I wrote it for use in OneTrainer though it should be adaptable to other frameworks easily enough. If you want to try it out, I'd love to hear what you find.
  
  2 replies →

smerrill25 3 months ago

Hey, do you know how you figured out about this information? I would be super curious to keep track of current ad-hoc ways of pushing older models to do cooler things. LMK

ttul 3 months ago

1) Reading papers. 2) Reading "Deep Learning: Foundations and Concepts". 3) Taking Jeremy Howard's Fast.ai course

ethmarks 3 months ago

Gemini 3 is a 10 trillion parameter model?

ttul 3 months ago

I read that the pre-training model behind Gemini 3 has 10T parameters. That does not mean that the model they’re serving each day has 10T parameters. The online model is likely distilled from 10T down to something smaller, but I have not had either fact confirmed by Google. These are anecdotes.