Comment by _the_inflator

18 hours ago

Processing power, not training. The larger the scene in 2ď the more you need to compute. The resolution itself is not flexible. Imagine painting a white canvas. It is still a pixel per pixel algo which costs LLM GPU power while being the easiest thing to do without it.

You can create larger images by creating separate parts you recombine. But they may not perfectly match their borders.

It is a Landau thing not a trading thing. The idea of LLM is to work on the unknown.

1 comment

_the_inflator

vunderba 17 hours ago

It depends on the model. Diffusion models, which are among the more popular approaches, are typically trained at a specific image resolution.

For example, SDXL was trained on 1MP images, which is why if you try to generate images much larger than 1024×1024 without using techniques like high-res fixes or image-to-image on specific regions, you quickly end up with Cthulhu nightmare fuel.