Comment by scottmf
1 month ago
I independently did the same with an MLX implementation on Sunday (also with Claude Code).
I expected this C implementation to be notably faster, but my M3 Max (36GB) could barely make it past the first denoising step before OOMing (at 512x512)
Am I doing something wrong? The MLX implementation takes ~1/sec per step with the same model and dimensions: https://x.com/scottinallcaps/status/2013187218718753032
No comments yet
Contribute on Hacker News ↗