← Back to context

Comment by divyaprakash

16 days ago

Guilty as charged. I used Antigravity to handle the refactoring and docs so I could stay focused on the CUDA and VRAM orchestration.

4 comments

divyaprakash

Reply

wasmainiac 16 days ago

This isn’t a job interview, drop the corpo speak. What’s going on with Cuda and vram? We are all friends here.

divyaprakash 16 days ago
Haha fair enough.The actual internals are basically just one big fight with VRAM. I'm using decord to dump frames straight into GPU memory so the CPU doesn't bottleneck the pipeline. From there, everything—scene detection, hsv transforms, action scoring—is vectorized in torch (mostly fp16 to avoid ooming). I also had to chunk the audio stft/flux math because long files were just eating the card alive. The tts model stays cached as a singleton so it's snappy after the first run, and I'm manually tracking 'Allocated vs Reserved' memory to keep it from choking. Still plenty of refinement left on the roadmap, but it's a fun weekend project to mess around with.
- wasmainiac 16 days ago
  
  Nice! Thanks :) what is ooming?
  
  1 reply →