Comment by sizzle
1 year ago
Your voice is unique and can be fingerprinted to ID you (see Alexa devices). Add in things like positive sentiment analysis, changes in vocal inflection/intonation and context surrounding spoken products like purchase inference/intent and you can probably triangulate a threshold for showing products with high likelihood of purchasing intent.
Really smart people have been working on these things at Google for decades and that’s barely scratching the surface of this nuanced discussion. CPU/GPU has only gotten faster and smaller with more RAM available and better power management across the board for mobile devices.
Anything is possible if there is money to be made and it’s not explicitly illegal or better they can pay the fines after making their 100x ROI.
I guess my question to this is, what are the estimates on energy required to do this on a continuous timeframe?
Embedded Audio ML engineer here (albeit mostly outside of speech). A modern MEMS microphone uses typically 0.8 mA in full performance mode at 1.8V. Doing basic voice activity detection, which is the first step of a continuous listening pipeline, can be done in under 1 mA. Doing basic keyword spotting is likey doable in 10 mA. But this is only done on the part that the voice activity module triggered on. Lets say that is 4 hours per day. Then basic speech recognition, for buying phrases and categorization, would maybe cost 100 mA. But say only 10% of the 4 hours = 0.4 hours have keywords triggered. That would give a total power budget of (1.824)+(104)+(100*0.4) = 123 mAh per day. A typical mobile phone battery is 4000 mAh. People do not expect it to last many days anymore... So I would say that this is a actually in the feasible range. And this is before considering the very latest in low power hardware. Like MEMS mics with 0.3 mA power consumption or lower, MEMS microphones with built-in voice activity detection, or low power neural processing units (NPU) that some microcontrollers now have.
This is amazing thanks for doing the math. Didn’t realize the tech was feasibly there already off the shelf. I mean my Apple Watch can detect me saying “Hey Siri” all day with its puny battery.
If big tech isn’t doing this then it sounds like a huge startup idea worth $$$. I hope someone on here in the spirit of HN runs with it and blows the top off this topic once and for all if it’s monetizeable or expose the FAANG patent sharks that come out to play and silence them for infringing on their shady microphone tech.
2 replies →
Thank you for taking the time to post this informative response. As a sibling comment posted, didn't realize it was so feasible. When posting my original comment, i was thinking orders of magnitude more power would have been needed to facilitate this.
Why would they need to listen continuously?
Most people don’t speak, and aren’t around people talking to them, for more than a few minutes to hours in a day.
You just need to awake every few secs or mins to see if someone is speaking and go back to sleep if they’re not.
Could even switch on when you detect positive social signals implying you are around another person nearby using wifi, Bluetooth, gps, IP address, etc. to ID another device.
They could even pickup or recognize the second voiceprint ID and know it’s your best friend and wake up the audio recognition from ultra power saving mode or whatever. Literally anything is possible to make this is work.