Comment by divyaprakash
2 days ago
Haha fair enough.The actual internals are basically just one big fight with VRAM. I'm using decord to dump frames straight into GPU memory so the CPU doesn't bottleneck the pipeline. From there, everything—scene detection, hsv transforms, action scoring—is vectorized in torch (mostly fp16 to avoid ooming). I also had to chunk the audio stft/flux math because long files were just eating the card alive. The tts model stays cached as a singleton so it's snappy after the first run, and I'm manually tracking 'Allocated vs Reserved' memory to keep it from choking. Still plenty of refinement left on the roadmap, but it's a fun weekend project to mess around with.
Nice! Thanks :) what is ooming?
Out Of Memory-ing.