Comment by 1bpp

3 days ago

How would this prevent someone from just plugging ElevenLabs into it? Or the inevitable more realistic voice models? Or just a prerecorded spam message? It's already nearly impossible to tell if some speech is human or not. I do like the idea of recovering the emotional information lost in speech -> text, but I don't think it'd help the LLM issue.

Detecting "human speech" means shutting out people who cannot speak and rely on TTS for verbal communication.

  • Also speech impediments, accents, physical disabilities, etc etc.

    Tech culture just refuses to even be aware of people as physical beings. It's just spherical users in a vacuum and if you don't fit the mold, tough.

Or also a genuine human voice reading a script that’s partially or almost entirely LLM written? I think there must be some video content creators who do that.

True. However making voice input has higher friction than typing chatgpt write me a reply.