Comment by schobi
10 hours ago
Oh... Having a local-only voice assistant would be great. Maybe someone can share the practical side of this.
Do you have the GPU running all day at 200W to scan for wake words? Or is that running on the machine you are working on anyway?
Is this running from a headset microphone (while sitting at the desk?) or more like a USB speakerphone? Is there an Alexa jailbreak / alternative firmware as a frontend and run this on a GPU hidden away?
I recently trained a stt model which detects about 40 words- the model is less than <hold your breath> 50 kiolobytes. it can run on a <$1 chip.
Wake words are generally processed extremely early in the pipeline. So if you capture audio with, say, an ESP32 the uC does the wale word watching.
Theres even microphone ADCs and DSPs(if you use a mic that outputs PCM/i2S instead of analog) that do the processing internally.