Comment by kurthr
2 months ago
I can just say that I knew an entrepreneur in early post Y2K who developed apps to track music played in clubs in SF for folks like ASCAP, BMI, and SESAC. They gave out "free" phones (these were the small expensive candybars and nice flip/slideups) to the influencers of the day. They compressed the audio for orthogonality, and had a huge number of hashes to match. If they got more than a few consecutive matching hashes at a location that wasn't paying royalties, they got an enforcement call.
So the idea that it takes a huge amount of computing resources, battery life, permissions, or bandwidth to do matching of keywords is hilarious. That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day. Just add another hundred and report them once an hour. You don't need low latency. It's just another tool in the bag!
Of course the cat food example is bad, because if they weren't looking for that you wouldn't get a response. Who would be willing to pay big for clicks on cat food. Now bariatric surgery? DUI? HELOC? Those pay.
>That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day.
You might have just convinced me that the “phone is listening” is total bunk, because these dedicated devices are just so bad at recognizing the very specific, short, phrases when explicitly directed at them that I can’t imagine they are listening for much more. Listening to my in-laws try to activate their Alexa and Google Homes is something the CIA might consider for their next torture method.
You expect 95% accuracy matching activation phrases. You don't need that for ads. It only needs to work some of the time for some of the people, especially if it makes $/click.
>You expect 95% accuracy matching activation phrases.
At this point I don’t even expect 50% (trying twice), and I’m still disappointed.
>It only needs to work some of the time for some of the people, especially if it makes $/click.
So where can one find this market? We know the price of traditional ad clicks. Surely we’d see a market for “voice-driven” ads with higher rates?
> So the idea that it takes a huge amount of computing resources, battery life, permissions, or bandwidth to do matching of keywords is hilarious.
I also knew an entrepreneur who tried this same thing, but with TV shows.
Fingerprinting specific audio is a different algorithm problem entirely. You only need to sample a short section of audio every few minutes and then process the spectral peaks, which are fingerprinted against a database of known samples.
This is how apps that name a song work. It’s not the same as constant full speech to text.
But you’re skipping the key part of the story: They had to hand out phones specifically for this because you can’t get constant audio background processing from installing an app on a modern phone OS without the user noticing.
> That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day.
Again, wake word monitoring is a different algorithm. Monitoring for a wake word is a much simpler problem. They’re not processing everything you say, concerting it to text, and then doing a string compare for the wake word. It’s a very tiny learning model trained to match on a very specific phrase, which might run at a hardware level.
I agree it's a different algorithm, but not a higher powered one. You don't need to know context to get HELOC, Bariatric, or DUI. You also don't need 95%+ accuracy for 95% of the population. You're just doing advertising.
Doing 100 different matches updated frequently is an entirely different problem than matching a single wake word that isn’t changing.
Regardless, this would require so much coordination, network traffic, and on-device code that could be reverse engineered that you’re implying that nobody has every found a hint of it existing and no employees of these companies have ever leaked any hints of it existing.
It’s very much in the domain of conspiracy theories.
1 reply →
What kind of keywords would you imagine provide an actual, profitable advantage to an ad company? I can't imagine "computer 2", "fridge 3", "egg 4" being all that valuable compared to.. literally my whole browser history and my reaction to other ads/videos (I looked at that short for 10s vs immediately skipping builds a very nice profile). And now add i18n in the picture - even the main AI assistant products suck in anything other than English, so this fancy, advanced technology with low return of value would end up with a low target audience as well.
Also, "Siri" and the like ends up waking the main processor, which is definitely easy to prove/disprove. Just talk to your phone continuously for a long time and see if it wakes.
Low, even very low, return of value is not no return. Therefore, given they make some return, and it has some value, that's enough for them to do it. Ads and ad data are two sides. We are often not the target for an ad, but our data provides stats about how an ad is performing. If more consumers are influenced to spend $1000 on something than not, then it's worth if for them. It's an aggregate cost benefit analysis not how effective it is at the isolated individual level.
Another thing to consider is that we should never fall into the trap of thinking we are immune from influence from advertisers. Firstly, it's basically what advertiser want; it allows more actions like this, more of our data to be sold and secondly because it's easier to influence someone if they think of a decision as their own choice, than if they think they were manipulated into it. We do not remember the ads we see but we can remember that we are all susceptible to influence.
Return of value is with respect to the costs of it. A lawsuit/brand value loss from illegally recording every communication you make (which we would have definite proof if it were happening, given that there our more phones than people on Earth) would far outweigh the tiny benefit (if any? I'm not convinced you would get any extra information in the general case compared to the tracking of the regular usage of your phone)
Also, I don't see the relevance of your second paragraph. The baseline is not "no ads", the baseline is "ads supported by all the tracking that Meta/Google currently does".