Comment by 1bpp
3 days ago
How would this prevent someone from just plugging ElevenLabs into it? Or the inevitable more realistic voice models? Or just a prerecorded spam message? It's already nearly impossible to tell if some speech is human or not. I do like the idea of recovering the emotional information lost in speech -> text, but I don't think it'd help the LLM issue.
Detecting "human speech" means shutting out people who cannot speak and rely on TTS for verbal communication.
Also speech impediments, accents, physical disabilities, etc etc.
Tech culture just refuses to even be aware of people as physical beings. It's just spherical users in a vacuum and if you don't fit the mold, tough.
Or also a genuine human voice reading a script that’s partially or almost entirely LLM written? I think there must be some video content creators who do that.
True. However making voice input has higher friction than typing chatgpt write me a reply.