Comment by sebakubisz

8 hours ago

This is the kind of porting work I always hope for when I see a CUDA-only release. Have you thought about publishing the gather-scatter sparse 3D convolution and SDPA attention swaps as a standalone toolkit or writeup? A lot of folks running models locally on Apple Silicon hit the same wall with flash_attn, nvdiffrast, and custom sparse kernels and end up redoing the same work.

1 comment

sebakubisz

shivampkumar 7 hours ago

that makes so much sense...I am exploring if I can find someone who has done this well...If not I'll try to do it myself.