← Back to context

Comment by snowwrestler

1 year ago

Think about what this implies. If your phone is listening, it’s listening all the time, right? So like 12-18 hours of continuous audio every day. That’s a lot of ad triggers. Way too many to actually be served with ads during your browsing time, which is a strict subset of your total audible proximity to your phone (plus ad inventory is a strict subset of what you view on your phone).

So how does the phone + ad networks decide which words to prioritize to trigger which ads when?

So for this anecdote to be true, not only would the phone have to be listening, but the targeting algorithm would need to decide to actively exclude all the other audible triggers from that time period, and fill your limited ad impression inventory with the one phrase you were intentionally testing.

How would it do that? Especially if this is indeed an outlier one-off topic of conversation that you cover in a single sentence. There would not be contextual clues (like repetition over time) that might indicate you are actually “in market” for a pool fence.

To me this is the problem with these anecdotal tests. You understood that that was an important phrase in the context of ad targeting. But how did the automated ad system know it should serve you ads on that topic, and not one of the many other advertisable topics you talk about over the course of several days? Or that your phone hears over several days?

1) App stores the trailing two minutes of speech in memory.

2) If the app detects a consumption-related trigger word, the related conversation is flagged for transmission to the server.

3) Flagged audio block is converted to text. Consumption related verbs ("buy", "purchase", etc) are identified. The syntax of the sentence clearly indicates which noun is the target of a given consumption-related verb ("new car", "pool fencing")

4) Serve related ads

  • Where's the proof that this is happening?

    Lots of people run network traffic sniffers to see what apps are doing. Lots of people decompile apps. Lots of people at companies leak details of bad things they are doing.

    Why has nobody been able to demonstrate this beyond anecdotes about talking about swimming pools and then getting adverts for swimming pool stuff?

    • These are fair questions! I'm not convinced that it is happening. Nor am I convinced, as the parent seems to be, that it would be difficult to do.

      edit: Having re-read my comment, I can see how it could easily be read to say "It's happening and this is how it works", whereas I intended to convey something like "It could easily be done and here's how." I have a bad habit of implying my point rather than stating it outright. I'm working on it!

      1 reply →

    • I have a suspicion it’s not Facebook or Google listening in, but rather other third party apps. In fact it’s not even the third party apps but the libraries/frameworks they use to show ads.

      9 replies →

    • There will be no proof until somebody inside Apple who is in on the scam decides to grow a conscience and blow the whistle. Then they will be dismissed as a "disgruntled employee". Decompiling Siri probably will get you a lot of attention from very expensive lawyers that will make your life very interesting for a long while.

      2 replies →

    • I can't even begin to tell you how many times I've been randomly having a conversation with someone, only to be alerted to the sound of the Google Assistant suddenly responding to what we're saying. Something we said was interpreted as a wake word, and then from that point on, every single thing we said was transcribed via STT, sent to Google's servers, various Google search queries were run, etc, and then the assistant responded - because it thought it was responding to a valid query and had no way of knowing otherwise. This has gotten worse with Gemini but has in no way been limited to that.

      In this situation, I was alerted to this because the assistant started responding. However, I've also been in situations where I tried deliberately to talk to the assistant and it failed silently. In those situations, the UI spawns the Assistant interaction dialog, listens to what I say and then just silently closes. Sometimes this happens if there's too much background noise, for instance, and it then just re-evaluates that it wasn't a valid query at all and exits. Sometimes some background process may be frozen. Who knows if this happens before or after sending the data to the server. Sometimes the dialog lingers, waiting for the next input, and sometimes it just shuts off, leaving me (annoyingly) to have to reopen the dialog.

      Putting that together, I have no idea how many times the Google Assistant has activated in my pocket, gone live, recorded stuff, sent it to Google's servers, realized it wasn't a valid query, and shut off without alerting me. I've certainly seen the Assistant dialog randomly open when looking at my phone plenty of times, which is usually a good indicator that such a thing has happened. If it silently fails in such a way that the UI doesn't respawn, then I would have no idea at all.

      The net effect is that Google gets a random sample from billions of random conversations from millions of people every time this thing unintentionally goes off. They have a clear explanation as to why they got it and why ads are being served in response afterward. They can even make the case that the system is functioning as intended - after all, it'd be unreasonable to expect no false positives, or program bugs, or whatever, right? They can even say it's the user's fault and that they need to tune the voice model better.

      Regardless, none of this changes the net result, which is they get a random sample of your conversation from time to time and are allowed to do whatever with it that they would have done if you sent it on purpose.

      1 reply →

    • I don't know - where is the proof it's not? It's not like I can look at the source to the ad SDKs and figure out what they're doing.

      It's much better and safer to assume they are than they're not, especially because I've seen many many results which indicate they are.

      3 replies →

Your voice is unique and can be fingerprinted to ID you (see Alexa devices). Add in things like positive sentiment analysis, changes in vocal inflection/intonation and context surrounding spoken products like purchase inference/intent and you can probably triangulate a threshold for showing products with high likelihood of purchasing intent.

Really smart people have been working on these things at Google for decades and that’s barely scratching the surface of this nuanced discussion. CPU/GPU has only gotten faster and smaller with more RAM available and better power management across the board for mobile devices.

Anything is possible if there is money to be made and it’s not explicitly illegal or better they can pay the fines after making their 100x ROI.

  • I guess my question to this is, what are the estimates on energy required to do this on a continuous timeframe?

    • Embedded Audio ML engineer here (albeit mostly outside of speech). A modern MEMS microphone uses typically 0.8 mA in full performance mode at 1.8V. Doing basic voice activity detection, which is the first step of a continuous listening pipeline, can be done in under 1 mA. Doing basic keyword spotting is likey doable in 10 mA. But this is only done on the part that the voice activity module triggered on. Lets say that is 4 hours per day. Then basic speech recognition, for buying phrases and categorization, would maybe cost 100 mA. But say only 10% of the 4 hours = 0.4 hours have keywords triggered. That would give a total power budget of (1.824)+(104)+(100*0.4) = 123 mAh per day. A typical mobile phone battery is 4000 mAh. People do not expect it to last many days anymore... So I would say that this is a actually in the feasible range. And this is before considering the very latest in low power hardware. Like MEMS mics with 0.3 mA power consumption or lower, MEMS microphones with built-in voice activity detection, or low power neural processing units (NPU) that some microcontrollers now have.

      4 replies →

    • Why would they need to listen continuously?

      Most people don’t speak, and aren’t around people talking to them, for more than a few minutes to hours in a day.

      You just need to awake every few secs or mins to see if someone is speaking and go back to sleep if they’re not.

      1 reply →

My phone can listen all day every day. It listens for "hey google" and it can listen and passively tell you songs that are playing. It's not outside the realm of possibility to do their audio fingerprinting on keywords and what not. The advertising potential makes it extremely juicy

  • Your phone can listen for “hey Google” because it’s only one phrase and the model can run at very low power on specialized hardware. If you want to add 1000 keywords the battery drain would be intense.

    • Pixel phones run song identification constantly now. They have a local database of the top 1000 (?) most popular songs. It has negligible impact on battery life.

      Not saying I agree that 'phones are listening to show us ads', but technically we have the capability for that to happen (sampling audio every X intervals and matching against a local database of keywords)

      11 replies →

The system knows to serve you ads about the new topic because it's new. You're already getting ads for the stuff you're normally talking about. The new topic stands out easily.

It doesn't have to be your phone. Could be your TV or any other device.

Most importantly there's just patterns of behavior. Companies are absolutely desperate for every scrap of data they can get on you. Why would they not capture audio from your mic?

You’re so right. We should just trust the computers in our pockets, hands, and nightstands 24/7/365 running proprietary operating systems, firmware, and sensor suites phoning home as much targeting data as they can possibly collect — but not that! What could they possibly gain from harvesting that?

  • Companies really are using tons of highly sensitive data to target ads, even when we sleep. But they're not generally using microphones to record audio to do it. Both things can be accurate statements.

It's not a strict 12-18 hour window. Instead, it depends on the time frame between specific vocal or conversational cues / signal vs. noise.

>So how does the phone + ad networks decide which words to prioritize to trigger which ads when?

The same way they analyze your email and web searches. Basically, statistics.

>To me this is the problem with these anecdotal tests. You understood that that was an important phrase in the context of ad targeting. But how did the automated ad system know it should serve you ads on that topic, and not one of the many other advertisable topics you talk about over the course of several days? Or that your phone hears over several days?

Buddy, so many people have witnessed this happening for at least 10 years and even done experiments at this point that it's common knowledge. I know for a fact that one of my friends now has a phone that is especially receptive to hearing me say things around it, because our conversation topics ALWAYS come up in my searches, ads, and feeds shortly after. Think about that. Someone else's phone sends data to a cloud that I never gave permission to. It then puts that together with data from MY phone about where I was (perhaps even the devices chirping at each other!). The aggregation happens within a week then I see relevant ads. I've seen this happen dozens of times. It's no coincidence.

As far as the article, I'm not even going to read it. It's got to be stupid. We know from leaks, reverse-engineering, and personal experience that this spying is going on. I question the source of this article, but I suppose we should never underestimate the lengths someone will go to in order to feel that they are smarter than the rest of us with our eyes open.

  • "We know from leaks, reverse-engineering"

    I would be VERY interested to hear details of those leaks and that reverse-engineering. I've only ever heard the personal anecdotes.

    (If you'd read my article you would have seen this bit at the top: "Convincing people of this is basically impossible. It doesn’t matter how good your argument is, if someone has ever seen an ad that relates to their previous voice conversation they are likely convinced and there’s nothing you can do to talk them out of it.")

    • I truly wish I had a bibliography to give you but it has been so obviously true to me that I hadn't bothered to catalogue all of this information. I'll try to get you started though. Start by familiarizing yourself with the Snowden leaks and how the government buys data from private companies to violate the constitution. Second, look for articles like this one: https://www.pcworld.com/article/2450052/do-smartphones-liste... This kind of thing is published periodically. Apple lost a lawsuit over Siri spying "inadvertently" very recently: https://arstechnica.com/tech-policy/2025/01/apple-agrees-to-... There is no reason to believe that your phone is ever not listening. The audio can at least be transcribed and catalogued.

      If companies are willing to track your every click and mouse movement, every footstep and slight movement you make with your phone even while you are asleep, build and bundle keyboard apps to capture what you type, monitor you with AI, etc., are you seriously surprised that they would not also listen to you? None of that stuff I just described is fiction. It's established tech that has been documented over time. The only reason it's not 100% illegal is because the EULA probably covers it.

      I swear people who think they aren't listening when they can seem like people who would be shocked to learn that an armed carjacker might demand your wallet in addition to your car. Unreal...

      Oh yeah one more tip. Try to use the data export feature from Google or Facebook. You might just be surprised what you find. I've heard of people finding recordings of private conversations picked up by Google devices. I personally found hundreds of Facebook messages and posts that I deleted with a tool, and aren't visible to anyone (OK maybe the messages make sense but not the posts).

      9 replies →

  • >We know from leaks, reverse-engineering, and personal experience that this spying is going on.

    No we don't. There isn't any of that. This is flat earthing for technophiles.

"How would it do that? "

AI.

That is entire premise of 'Nexis' from Yuval Harari.

Individualized bot driven surveillance .

https://www.theregister.com/2024/09/16/oracle_ai_mass_survei...

""Ellison declares Oracle all-in on AI mass surveillance, says it'll keep everyone in line

Cops to citizens will be 'on their best behavior because we're constantly recording and reporting'""

  • How were they doing this in 2017?

    • A lot of things are happening now that happened before. That doesn't mean things don't improve. Increased efficiency is the issue now. In 2017, maybe some simple algorithms, or a person, was intervening to drive ads, now AI is a big step change in better targeting.

      That was one of the points in the book. in 2017 or before, a surveillance state was limited by the number of people it takes to do the actual surveillance . Now AI increases the efficiency.