Comment by scottmf

4 months ago

I independently did the same with an MLX implementation on Sunday (also with Claude Code).

I expected this C implementation to be notably faster, but my M3 Max (36GB) could barely make it past the first denoising step before OOMing (at 512x512)