Comment by thedangler

1 month ago

How did you do this locally? Tools? Language?

4 comments

thedangler

I just followed the Quickstart[1] in the GitHub repo, refreshingly straight forward. Using the pip package worked fine, as did installing the editable version using the git repository. Just install the CUDA version of PyTorch[2] first.

The HF demo is very similar to the GitHub demo, so easy to try out.

  pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
  pip install qwen3-tts
  qwen-tts-demo Qwen/Qwen3-TTS-12Hz-1.7B-Base --no-flash-attn --ip 127.0.0.1 --port 8000

That's for CUDA 12.8, change PyTorch install accordingly.

Skipped FlashAttention since I'm on Windows and I haven't gotten FlashAttention 2 to work there yet (I found some precompiled FA3 files[3] but Qwen3-TTS isn't FA3 compatible yet).

[1]: https://github.com/QwenLM/Qwen3-TTS?tab=readme-ov-file#quick...

[2]: https://pytorch.org/get-started/locally/

[3]: https://windreamer.github.io/flash-attention3-wheels/

dur-randir 1 month ago

https://github.com/sdbds/flash-attention-for-windows/release... - FA2 binaries for you
regularfry 1 month ago
It flat didn't work for me on mps. CUDA only until someone patches it.
- magicalhippo 1 month ago
  
  Demo ran fine, if very slowly, with CPU-only using "--device cpu" for me. It defaults to CUDA though.
  Try using mps I guess, I saw multiple references to code checking if device is not mps, so seems like it should be supported. If not, CPU.