Comment by raincole
3 days ago
Once it's good enough Audible will be flooded with AI-narrated books so we'll know soon. (The only question is whether Amazon would disclose it, ofc)
3 days ago
Once it's good enough Audible will be flooded with AI-narrated books so we'll know soon. (The only question is whether Amazon would disclose it, ofc)
Audible has already flooded their store with generated audio books. Go to the "Plus Catalog" and it's filled with them. The quality at the moment is complete trash, but I can't imagine it won't get better quickly.
The whole audiobook business will eventually disappear - probably within the decade. There will only be ebooks and on-device AI assistants will read it to you on demand.
I imagine it'll go like this: First pre-generated audiobooks as audio files. Next, online service to generate audio on demand with hyper customizable voices which can be downloaded. Next, a new ebook format which embeds instructions for narration and pronunciation to be read on-device. Finally, AI that's good enough to read it like a storyteller instantly without hints.
> There will only be ebooks and on-device AI assistants will read it to you on demand.
Honestly I read (or rather, listen to) a lot of books already by getting the epubs onto my phone then using a very basic TTS to read it out. Yes, they're definitely not as lifelike as even the most common AI TTS systems but they're good enough to listen to at high speed. Moon+ Reader is pretty good for Android, not sure about iOS.
Flip side is a solution where I can have a book without an audiobook auto-generated (or use an existing ebook rather than paying audible $30 for their version) and it's "good enough" is a legit improvement. AI generated isn't as good but it's better than nothing. Also, being able to interrupt and ask for more detail/context would be pretty nice. Like I'm reading some Pynchon and I have to stop sometimes and look up the name of a reference to some product nobody knows now, stuff like that.
If you're willing to forgo the interactive LLM bit, kokoro-tts (just a script using Kokoro-ONNX) takes epubs and outputs a series of wavs or mp3s that need to be stitched together into chapters or audiobook m4a with some ffmpeg fu. I've listened to several generated audiobooks, and found them pretty good. Some nice generic narration-like prosody. It uses espeak-ng to generate phonemes and passes those to the model to render voice, so it generally pronounces things quite well. It comes with a handful of nice voices and several can be blended, but no easy voice cloning, like chatterbox, that I'm aware of.
https://github.com/nazdridoy/kokoro-tts/blob/main/kokoro-tts
I've used this repo and its great. It was one many things that inspired me in building a similar tool. I built https://desktop.with.audio
It was important to me that it be 100% private and local and wanted it to be a one time payment solution. Because it locally process your data it can be a one time payment text to speech app.
If you are interested in creating audiobooks from epubs check this demo: https://www.youtube.com/watch?v=pOHzo6Oq0lQ If you are interested in listening while reading with text highlighting check these demos: - https://www.youtube.com/watch?v=8yJ-lsbzAuw - https://www.youtube.com/watch?v=y8wi4d8xmnw
audiblez[1] does exactly that and handles the ffmpeg fu part for you, and will output a m4b file which audio book players will support.
1. https://github.com/santinic/audiblez
I've been using epub2tts / epub2tts-edge and its been working well for me. Converts into m4b
I think you're a bit behind on it: https://www.audible.com/about/newsroom/audible-expands-catal...
its watermarked
It's open source. It's not in the model. The watermark function is added to show you how to use it. You can just remove it.
``` watermarked_wav = self.watermarker.apply_watermarl(... ```