Comment by rahimnathwani

1 day ago

Has anyone successfully run this on a Mac? The installation instructions appear to assume an NVIDIA GPU (CUDA, FlashAttention), and I’m not sure whether it works with PyTorch’s Metal/MPS backend.

FWIW you can run the demo without FlashAttention using --no-flash-attn command-line parameter, I do that since I'm on Windows and haven't gotten FlashAttention2 to work.

It seems to depend on FlashAttention, so the short answer is no. Hopefully someone does the work of porting the inference code over!