Comment by _boffin_

1 year ago

I guess my question to this is, what are the estimates on energy required to do this on a continuous timeframe?

Embedded Audio ML engineer here (albeit mostly outside of speech). A modern MEMS microphone uses typically 0.8 mA in full performance mode at 1.8V. Doing basic voice activity detection, which is the first step of a continuous listening pipeline, can be done in under 1 mA. Doing basic keyword spotting is likey doable in 10 mA. But this is only done on the part that the voice activity module triggered on. Lets say that is 4 hours per day. Then basic speech recognition, for buying phrases and categorization, would maybe cost 100 mA. But say only 10% of the 4 hours = 0.4 hours have keywords triggered. That would give a total power budget of (1.824)+(104)+(100*0.4) = 123 mAh per day. A typical mobile phone battery is 4000 mAh. People do not expect it to last many days anymore... So I would say that this is a actually in the feasible range. And this is before considering the very latest in low power hardware. Like MEMS mics with 0.3 mA power consumption or lower, MEMS microphones with built-in voice activity detection, or low power neural processing units (NPU) that some microcontrollers now have.

  • This is amazing thanks for doing the math. Didn’t realize the tech was feasibly there already off the shelf. I mean my Apple Watch can detect me saying “Hey Siri” all day with its puny battery.

    If big tech isn’t doing this then it sounds like a huge startup idea worth $$$. I hope someone on here in the spirit of HN runs with it and blows the top off this topic once and for all if it’s monetizeable or expose the FAANG patent sharks that come out to play and silence them for infringing on their shady microphone tech.

    • Hah, that's another great argument against this being a real thing: where are the startup pitches?

      If this targeting technique works and is feasible and legal and in demand by advertisers, why isn't there a competitive group of startups all trying to do it better than each other and sell the results?

      Now the conspiracy theory has grown to include "dozens of companies compete at this, all of them secretively operating in a marketplace that is entirely invisible to the outside world."

      1 reply →

  • Thank you for taking the time to post this informative response. As a sibling comment posted, didn't realize it was so feasible. When posting my original comment, i was thinking orders of magnitude more power would have been needed to facilitate this.

Why would they need to listen continuously?

Most people don’t speak, and aren’t around people talking to them, for more than a few minutes to hours in a day.

You just need to awake every few secs or mins to see if someone is speaking and go back to sleep if they’re not.

  • Could even switch on when you detect positive social signals implying you are around another person nearby using wifi, Bluetooth, gps, IP address, etc. to ID another device.

    They could even pickup or recognize the second voiceprint ID and know it’s your best friend and wake up the audio recognition from ultra power saving mode or whatever. Literally anything is possible to make this is work.