← Back to context

Comment by Aurornis

2 months ago

> So the idea that it takes a huge amount of computing resources, battery life, permissions, or bandwidth to do matching of keywords is hilarious.

I also knew an entrepreneur who tried this same thing, but with TV shows.

Fingerprinting specific audio is a different algorithm problem entirely. You only need to sample a short section of audio every few minutes and then process the spectral peaks, which are fingerprinted against a database of known samples.

This is how apps that name a song work. It’s not the same as constant full speech to text.

But you’re skipping the key part of the story: They had to hand out phones specifically for this because you can’t get constant audio background processing from installing an app on a modern phone OS without the user noticing.

> That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day.

Again, wake word monitoring is a different algorithm. Monitoring for a wake word is a much simpler problem. They’re not processing everything you say, concerting it to text, and then doing a string compare for the wake word. It’s a very tiny learning model trained to match on a very specific phrase, which might run at a hardware level.

I agree it's a different algorithm, but not a higher powered one. You don't need to know context to get HELOC, Bariatric, or DUI. You also don't need 95%+ accuracy for 95% of the population. You're just doing advertising.

  • Doing 100 different matches updated frequently is an entirely different problem than matching a single wake word that isn’t changing.

    Regardless, this would require so much coordination, network traffic, and on-device code that could be reverse engineered that you’re implying that nobody has every found a hint of it existing and no employees of these companies have ever leaked any hints of it existing.

    It’s very much in the domain of conspiracy theories.

    • Well, actually when you're hash based doing 100 different matches is the easy part. I'm not sure you know how steep FAR/FRR curves are for >99%/95% singe word accuracy, but having seen wake word development it's easily 100x harder than 95%/90% accuracy and none of the heavy calculation other than voice compression needs to be done locally or in a short time period. The network traffic is literally a few hundred hashes downloaded and hundreds of bits of hash matches a day (~1kB).

      Even in the article there are multiple reports of it that are dismissed, and even though reverse engineering larger apps on iPhone/Android is certainly possible, with obfuscation searching for yet another hash table matching or simple voice compression is also quite difficult. Where are all the other articles reporting on the reverse engineering the very screencap apps this article talked about? Are they also just more well documented conspiracy theories?

      Frankly, your best argument is that nobody is selling this as a product. So maybe there are easier more effective methods, but not because it can't or hasn't been done (since it literally has and it's been reported). It's kinda the opposite of a conspiracy theory. You have to assume that everyone capable with a vested interest won't do it, or that all of them will be caught, or that making money with ads becomes unpopular.