← Back to context

Comment by travisvn

2 days ago

Ok, here's a command that works.

I'm new to actually commenting on HN as opposed to just lurking, so I hope this formatting works..

  cat your_file.txt | python3 -c 'import sys, json; print(json.dumps({"input": sys.stdin.read()}))' | curl -X POST http://localhost:5123/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d @- \
    --output speech.wav

Just replace the `your_file.txt` with.. well, you get it.

This'll hopefully handle any potential issues you'd have with quotes or other symbols breaking the JSON input.

Let me know how it goes!

Oh and you might want to change `python3` to `python` depending on your setup.

> Just replace the `your_file.txt` with.. well, you get it.

> This'll hopefully handle any potential issues you'd have with quotes or other symbols breaking the JSON input.

> Let me know how it goes!

Wow. I'm humbled and grateful.

I'll update once I'm done with work and back in front of my hone nachine.

  • Hey — just pushed a big update that adds an (opt-in) frontend to test the API

    For now, there's just a textarea for input (so you'll have to copy the `.txt` contents) — but it's a lot easier than trying to finagle into a `curl` request

    Let me know if you have any issues!

    • (Didn't carefully read your reply. What follows are the results of cat-ing a text file in the CLI. Will give the new textbox a whirl in the morning PDT. A truly heartfelt thanks for helping me work with Chatterbox TTS!)

      Absolutely blown away.

      I fed it the first page of Gibson's "Neuromancer" and your incantation worked like a charm. Thanks for the shell script pipe mojo.

      Some other details:

        - 3:01 (3 mins, 1 sec) of generated .wav took 4:28 to process
        - running on M4 Max with 128GB RAM
        - Chatterbox TTS inserted a few strange artifacts which sounded like air venting, machine whirring, and vehicles passing. Very odd and, oddly, apropos for cyberpunk.
        - Chatterbox TTS managed to enunciate the dialog _as_ dialog, even going so far as to mimick an Australian accent where the speaker was identified as such. (This might be the effect of wishful listening.)
      

      I am astounded.

      2 replies →