← Back to context

Comment by travisvn

3 days ago

Chatterbox is fantastic.

I created an API wrapper that also makes installation easier (Dockerized as well) https://github.com/travisvn/chatterbox-tts-api/

Best voice cloning option available locally by far, in my experience.

10 comments

travisvn

Reply

mistersquid 2 days ago

> Chatterbox is fantastic.

> I created an API wrapper that also makes installation easier (Dockerized as well) https://github.com/travisvn/chatterbox-tts-ap

Gave your wrapper a try and, wow, I'm blown away by both Chatterbox TTS and your API wrapper.

Excuse the rudimentary level of what follows.

Was looking for a quick and dirty CLI incantation to specify a local text file instead of the inline `input` object, but couldn't figure it.

Pointers much appreciated.

travisvn 2 days ago
This API wrapper was initially made to support a particular use case where someone's running, say, Open WebUI or AnythingLLM or some other local LLM frontend.
A lot of these frontends have an option for using OpenAI's TTS API, and some of them allow you to specify the URL for that endpoint, allowing for "drop-in replacements" like this project.
So the speech generation endpoint in the API is designed to fill that niche. However, its usage is pretty basic and there are curl statements in the README for testing your setup.
Anyway, to get to your actual question, let me see if I can whip something up. I'll edit this comment with the command if I can swing it.
In the meantime, can I assume your local text files are actual `.txt` files?
- mistersquid 2 days ago
  
  This is way more of a response than I could have even hoped for. Thank you so much.
  To answer your question, yes, my local text files are .txt files.
  
  5 replies →

venusenvy47 2 days ago

Would this be usable on a PC without a GPU?

travisvn 2 days ago

It can definitely run on CPU — but I'm not sure if it can run on a machine without a GPU entirely.
To be honest, it uses a decently large amount of resources. If you had a GPU, you could expect about 4-5 gb memory usage. And given the optimizations for tensors on GPUs, I'm not sure how well things would work "CPU only".
If you try it, let me know. There are some "CPU" Docker builds in the repo you could look at for guidance.
If you want free TTS without using local resources, you could try edge-tts https://github.com/travisvn/openai-edge-tts