Comment by abraxas
3 days ago
Are these things good enough to narrate a book convincingly or does the voice lose coherence after a few paragraphs being spoken?
3 days ago
Are these things good enough to narrate a book convincingly or does the voice lose coherence after a few paragraphs being spoken?
Most of these TTS systems tend to fall apart the longer the text - it's a good idea to just wrap any longform text into separate paragraph segmented batches and then stitch them back together again at the end.
I've also found that if your one-shot sample wave isn't really clean that sometimes Chatterbox produces random unholy whooshing sounds at the end of the generated audio which is an added bonus if you're recording Dante's Inferno.
Yes, I've generated an audiobook of a epub using this tool and the result was passable: https://github.com/santinic/audiblez
Regarding your example "On a Google Colab's T4 GPU via Cuda, it takes about 5 minutes to convert "Animal's Farm"", do you know the approximate cost to perform this? I've only used Colab at the free level, so I have no concept of the costs for GPU time.
Once it's good enough Audible will be flooded with AI-narrated books so we'll know soon. (The only question is whether Amazon would disclose it, ofc)
Audible has already flooded their store with generated audio books. Go to the "Plus Catalog" and it's filled with them. The quality at the moment is complete trash, but I can't imagine it won't get better quickly.
The whole audiobook business will eventually disappear - probably within the decade. There will only be ebooks and on-device AI assistants will read it to you on demand.
I imagine it'll go like this: First pre-generated audiobooks as audio files. Next, online service to generate audio on demand with hyper customizable voices which can be downloaded. Next, a new ebook format which embeds instructions for narration and pronunciation to be read on-device. Finally, AI that's good enough to read it like a storyteller instantly without hints.
> There will only be ebooks and on-device AI assistants will read it to you on demand.
Honestly I read (or rather, listen to) a lot of books already by getting the epubs onto my phone then using a very basic TTS to read it out. Yes, they're definitely not as lifelike as even the most common AI TTS systems but they're good enough to listen to at high speed. Moon+ Reader is pretty good for Android, not sure about iOS.
Flip side is a solution where I can have a book without an audiobook auto-generated (or use an existing ebook rather than paying audible $30 for their version) and it's "good enough" is a legit improvement. AI generated isn't as good but it's better than nothing. Also, being able to interrupt and ask for more detail/context would be pretty nice. Like I'm reading some Pynchon and I have to stop sometimes and look up the name of a reference to some product nobody knows now, stuff like that.
If you're willing to forgo the interactive LLM bit, kokoro-tts (just a script using Kokoro-ONNX) takes epubs and outputs a series of wavs or mp3s that need to be stitched together into chapters or audiobook m4a with some ffmpeg fu. I've listened to several generated audiobooks, and found them pretty good. Some nice generic narration-like prosody. It uses espeak-ng to generate phonemes and passes those to the model to render voice, so it generally pronounces things quite well. It comes with a handful of nice voices and several can be blended, but no easy voice cloning, like chatterbox, that I'm aware of.
https://github.com/nazdridoy/kokoro-tts/blob/main/kokoro-tts
3 replies →
I think you're a bit behind on it: https://www.audible.com/about/newsroom/audible-expands-catal...
its watermarked
It's open source. It's not in the model. The watermark function is added to show you how to use it. You can just remove it.
``` watermarked_wav = self.watermarker.apply_watermarl(... ```
I consult a company in the space (not resemble) and I can definitely say it can narrate a book
A year ago for fun I gave a friend a Carl Rogers therapy audiobook, for fun I made an Attenbrough esque reading and it was pretty good over a year ago so should be better now.