Comment by Jordan-117
4 days ago
To me, it feels similarly impossible/spooky to how image models work.
Consider a model like SDXL:
- each image is 512x512, plenty of detail
- max prompt length is 77 tokens, or a solid paragraph
- each image has a seed value between 0 and 9,999,999, with each seed giving a completely different take on the prompt
I can't begin to calculate the upper limit on the number of possible human-readable prompts that can fit in 77 tokens, but multiply even an (extremely conservative) estimate of a million possible prompts by 10 million seeds and it's clear that this model "contains", at minimum, literally tens of trillions of possible meaningful images -- all in a model file that's under 7 GB.
I suspect it works similarly to the biological side -- evolutionary pressure encoding complex patterns into hyper-efficient "programs" that aren't easily interpretable, but eerily effective despite their compact size.
No comments yet
Contribute on Hacker News ↗