Comment by schobi

1 month ago

Oh... Having a local-only voice assistant would be great. Maybe someone can share the practical side of this.

Do you have the GPU running all day at 200W to scan for wake words? Or is that running on the machine you are working on anyway?

Is this running from a headset microphone (while sitting at the desk?) or more like a USB speakerphone? Is there an Alexa jailbreak / alternative firmware as a frontend and run this on a GPU hidden away?

2 comments

schobi

dwa3592 1 month ago

I recently trained a stt model which detects about 40 words- the model is less than <hold your breath> 50 kiolobytes. it can run on a <$1 chip.

butvacuum 1 month ago

Wake words are generally processed extremely early in the pipeline. So if you capture audio with, say, an ESP32 the uC does the wale word watching.

Theres even microphone ADCs and DSPs(if you use a mic that outputs PCM/i2S instead of analog) that do the processing internally.