← Back to context

Comment by mistersquid

3 days ago

(Didn't carefully read your reply. What follows are the results of cat-ing a text file in the CLI. Will give the new textbox a whirl in the morning PDT. A truly heartfelt thanks for helping me work with Chatterbox TTS!)

Absolutely blown away.

I fed it the first page of Gibson's "Neuromancer" and your incantation worked like a charm. Thanks for the shell script pipe mojo.

Some other details:

  - 3:01 (3 mins, 1 sec) of generated .wav took 4:28 to process
  - running on M4 Max with 128GB RAM
  - Chatterbox TTS inserted a few strange artifacts which sounded like air venting, machine whirring, and vehicles passing. Very odd and, oddly, apropos for cyberpunk.
  - Chatterbox TTS managed to enunciate the dialog _as_ dialog, even going so far as to mimick an Australian accent where the speaker was identified as such. (This might be the effect of wishful listening.)

I am astounded.

An M4 Max with 128GB RAM? drools

What did your `it/s` end up looking like with that setup? MLX is fascinating to me. Apple made a really smart decision with the induction of its M-series.

With regard to the artifacts — this is definitely a known issue with Chatterbox. I'm unsure of where the current investigation on fixing it is at (or what the "tricks" are to avoid this), but it's definitely something that is eery among other things.

I appreciate your feedback through all of this!

Would love to have you on the Discord to keep in touch https://chatterboxtts.com/discord

  • I'll follow up on Discord!

    For those following along at home: frontend works (and is quite nice) after updating `vite.config.ts` with a proxy

      server: {
        proxy: {
          // Proxy all API requests to the FastAPI backend
          '/v1': 'http://localhost:4123',
        },
      },