Comment by rahimnathwani

18 days ago

Has anyone successfully run this on a Mac? The installation instructions appear to assume an NVIDIA GPU (CUDA, FlashAttention), and I’m not sure whether it works with PyTorch’s Metal/MPS backend.

5 comments

rahimnathwani

magicalhippo 18 days ago

FWIW you can run the demo without FlashAttention using --no-flash-attn command-line parameter, I do that since I'm on Windows and haven't gotten FlashAttention2 to work.

turnsout 18 days ago

It seems to depend on FlashAttention, so the short answer is no. Hopefully someone does the work of porting the inference code over!

Lichtso 17 days ago

Yes, using mlx-audio. See https://news.ycombinator.com/item?id=46726440

rahimnathwani 16 days ago

Thanks! Simon's example uses the custom voice model (creating a voice from instructions). But that comment led me eventually to this page, which shows how to use mlx-audio for custom voices:

https://huggingface.co/mlx-community/Qwen3-TTS-12Hz-0.6B-Bas...

  uv tool install --force git+https://github.com/Blaizzy/mlx-audio.git --prerelease=allow
    
  python -m mlx_audio.tts.generate --model mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16 --text "Hello, this is a test." --ref_audio path_to_audio.wav --ref_text "Transcript of the reference audio." --play

javier123454321 18 days ago

I recommend using modal for renting the metal.