Comment by sixhobbits
1 year ago
Pixel phones run song identification constantly now. They have a local database of the top 1000 (?) most popular songs. It has negligible impact on battery life.
Not saying I agree that 'phones are listening to show us ads', but technically we have the capability for that to happen (sampling audio every X intervals and matching against a local database of keywords)
Add at least two zeros to your number. Pixel phones can detect the top 11k songs while being offline (it used to be more). The fingerprint database for this is around 500 MB in size.
I think it is very easy to sneak a few (thousand) extra fingerprints in this database and do all kinds of tracking with it. All while the green microphone icon is disabled.
For argument’s sake, let’s be generous and stipulate your phone is listening for 11k keywords to serve you ads.
Why would “pool fencing” take up one of those valuable keyword slots on everyone’s phone?
And you’re going to see way less than 11k ads per day. Why would the ad server prioritize serving an ad for pool fencing (a phrase said once) over all the far more common topics a person talks about in a typical day, like movies, TV shows, food and drink, clothes, cars, consumer electronics, music, etc?
"look into" is a much more likely trigger, then send the 30 seconds before and after to a server for more analysis. "buying" could be another. It's not like it would be that hard. Especially with some of the pretty good vocal compression for audio. It would be a small blip on a modern connection, even wireless.
I'm not saying it is or isn't happening but it wouldn't be hard.
1 reply →
Your argument plays with the idea that the phone listening stuff is the only source of information for the ad networks. But it would be much more complex. It would be only one of many signals, that are used to serve the consumer the right advertisement in the right moment. So it doesn't need to have the exact phrase "pool fencing" in the database. It just need to detect that something about pools, or swimming, etc. was talked about. Since Google has thousands of signals and statistics (like browsing history, current location, the other smartphones that are near, and those histories etc.) about this person, it can sell the ad space to "pool fencing" and expect a high click through rate. Selling ads is a bit like the current LLMs. It's just a stochastic parrot, that hallucinates stuff. But the stuff is often that advertisement that brings in the most money.
1 reply →
I dont know if phones listen to us to serve ads, but 11K is a decent vocab. Most adults have a vocab of 20K. Therefore I could imagine it including the words "pool" and "fencing".
Now Playing only has to sample for a few seconds every few minutes when the phone is powered on for other reasons (like to participate in cellular check-ins). This is because a song is typically several minutes long and you only have to fingerprint for a few seconds. It doesn't matter which few seconds. It's not continuously listening, so it's not the same thing at all.
Song identification and speech-to-text are massively different algorithms.
How does it work? Bought my gf a Pixel 8a recently.
These conspiracy theories have been floating around since long before any of this became practical.