Comment by Animats
1 month ago
Yes, such systems are still struggling with continuity.
(There might be a workflow solution to that. Part of the system needs to do the job of what old films list as the "continuity girl". For each shot, there's a blocking diagram of who stands where at the beginning of the shot. There's a description of what each character is wearing, holding, or touching. If something generated that for the end of each shot, and it was fed into the prompt for the beginning of the next shot, that would help maintain continuity. This is another example of where a concrete mid-level abstraction is needed to keep things on track.)
Anyone have any idea what tool generated this? It's way past Stable Diffusion.
Local text-to-vid such as LTX Video 2.3 or even older WAN could easily handle this then combine with something like SeedVR2 to upscale.
https://github.com/Lightricks/ComfyUI-LTXVideo