Comment by magicalhippo

17 days ago

I just followed the Quickstart[1] in the GitHub repo, refreshingly straight forward. Using the pip package worked fine, as did installing the editable version using the git repository. Just install the CUDA version of PyTorch[2] first.

The HF demo is very similar to the GitHub demo, so easy to try out.

  pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
  pip install qwen3-tts
  qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base --no-flash-attn --ip 127.0.0.1 --port 8000

That's for CUDA 12.8, change PyTorch install accordingly.

Skipped FlashAttention since I'm on Windows and I haven't gotten FlashAttention 2 to work there yet (I found some precompiled FA3 files[3] but Qwen3-TTS isn't FA3 compatible yet).

[1]: https://github.com/QwenLM/Qwen3-TTS?tab=readme-ov-file#quick...

[2]: https://pytorch.org/get-started/locally/

[3]: https://windreamer.github.io/flash-attention3-wheels/

3 comments

magicalhippo

dur-randir 17 days ago

https://github.com/sdbds/flash-attention-for-windows/release... - FA2 binaries for you

regularfry 17 days ago

It flat didn't work for me on mps. CUDA only until someone patches it.

magicalhippo 17 days ago

Demo ran fine, if very slowly, with CPU-only using "--device cpu" for me. It defaults to CUDA though.
Try using mps I guess, I saw multiple references to code checking if device is not mps, so seems like it should be supported. If not, CPU.