Comment by rahimnathwani

1 month ago

Has anyone successfully run this on a Mac? The installation instructions appear to assume an NVIDIA GPU (CUDA, FlashAttention), and I’m not sure whether it works with PyTorch’s Metal/MPS backend.

5 comments

rahimnathwani

magicalhippo 1 month ago

FWIW you can run the demo without FlashAttention using --no-flash-attn command-line parameter, I do that since I'm on Windows and haven't gotten FlashAttention2 to work.

turnsout 1 month ago

It seems to depend on FlashAttention, so the short answer is no. Hopefully someone does the work of porting the inference code over!

Lichtso 1 month ago

Yes, using mlx-audio. See https://news.ycombinator.com/item?id=46726440

rahimnathwani 1 month ago

Thanks! Simon's example uses the custom voice model (creating a voice from instructions). But that comment led me eventually to this page, which shows how to use mlx-audio for custom voices:

https://huggingface.co/mlx-community/Qwen3-TTS-12Hz-0.6B-Bas...

  uv tool install --force git+https://github.com/Blaizzy/mlx-audio.git --prerelease=allow
    
  python -m mlx_audio.tts.generate --model mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16 --text "Hello, this is a test." --ref_audio path_to_audio.wav --ref_text "Transcript of the reference audio." --play

javier123454321 1 month ago

I recommend using modal for renting the metal.