Comment by maxkfranz

10 days ago

Yeah, it’s not going to compare to Codex-5.2 or Opus 4.5.

Some non-programming use cases are interesting though, e.g. text to speech or speech to text.

Run a TTS model overnight on a book, and in the morning you’ll get an audiobook. With a simple approach, you’d get something more like the old books on tape (e.g. no chapter skipping), but regardless, it’s a valid use case.

Which TTS would you suggest? Anything out there that is able to properly see/handle modulation, punctuation and overall sentence 'mood'? I've been looking for something easy to set up but most is either extremely complex or is producing output of relatively poor quality.

  • I’m still experimenting with them. I suspect you may have to do only one paragraph at a time and concatenate them together. Let me know if you’d be interested in collaborating, as I’m interested in this use case too.