← Back to context

Comment by skygazer

4 days ago

If you're willing to forgo the interactive LLM bit, kokoro-tts (just a script using Kokoro-ONNX) takes epubs and outputs a series of wavs or mp3s that need to be stitched together into chapters or audiobook m4a with some ffmpeg fu. I've listened to several generated audiobooks, and found them pretty good. Some nice generic narration-like prosody. It uses espeak-ng to generate phonemes and passes those to the model to render voice, so it generally pronounces things quite well. It comes with a handful of nice voices and several can be blended, but no easy voice cloning, like chatterbox, that I'm aware of.

https://github.com/nazdridoy/kokoro-tts/blob/main/kokoro-tts

I've used this repo and its great. It was one many things that inspired me in building a similar tool. I built https://desktop.with.audio

It was important to me that it be 100% private and local and wanted it to be a one time payment solution. Because it locally process your data it can be a one time payment text to speech app.

If you are interested in creating audiobooks from epubs check this demo: https://www.youtube.com/watch?v=pOHzo6Oq0lQ If you are interested in listening while reading with text highlighting check these demos: - https://www.youtube.com/watch?v=8yJ-lsbzAuw - https://www.youtube.com/watch?v=y8wi4d8xmnw

I've been using epub2tts / epub2tts-edge and its been working well for me. Converts into m4b