Comment by rahimnathwani
18 days ago
Has anyone successfully run this on a Mac? The installation instructions appear to assume an NVIDIA GPU (CUDA, FlashAttention), and I’m not sure whether it works with PyTorch’s Metal/MPS backend.
18 days ago
Has anyone successfully run this on a Mac? The installation instructions appear to assume an NVIDIA GPU (CUDA, FlashAttention), and I’m not sure whether it works with PyTorch’s Metal/MPS backend.
FWIW you can run the demo without FlashAttention using --no-flash-attn command-line parameter, I do that since I'm on Windows and haven't gotten FlashAttention2 to work.
It seems to depend on FlashAttention, so the short answer is no. Hopefully someone does the work of porting the inference code over!
Yes, using mlx-audio. See https://news.ycombinator.com/item?id=46726440
Thanks! Simon's example uses the custom voice model (creating a voice from instructions). But that comment led me eventually to this page, which shows how to use mlx-audio for custom voices:
https://huggingface.co/mlx-community/Qwen3-TTS-12Hz-0.6B-Bas...
I recommend using modal for renting the metal.