Comment by ripped_britches

17 days ago

speech to speech is not nearly as good as livekit IMO ("old school" sequence of transcribe, LLM, synthesize). depends on what you're doing of course, but this is just because the LLMs are just way smarter than the speech to speech models which are pretty much the worst (again IMO) at anything beyond basic banter. and livekit is just a framework so you can hook it up with any models in the stack. im not an expert on the local parts but i would assume this pretty easy to glue together.

2 comments

ripped_britches

vidarh 16 days ago

They work for two entirely different things. The problem with these pipelines is that unless the latency is very low they simply aren't suitable replacements for Alexa etc. For that use case, low latency beats smarts.

ripped_britches 16 days ago

The latency is very very low in my experience, it would definitely work well as an Alexa style assistant