← Back to context

Comment by autoexec

6 days ago

I'm happy to see it. They should have included Roku in that too!

> Roughly twice per second, a Roku TV captures video “snapshots” in 4K resolution. These snapshots are scanned through a database of content and ads, which allows the exposure to be matched to what is airing. For example, if a streamer is watching an NFL football game and sees an ad for a hard seltzer, Roku’s ACR will know that the ad has appeared on the TV being watched at that time. In this way, the content on screen is automatically recognized, as the technology’s name indicates. The data then is paired with user profile data to link the account watching with the content they’re watching.

https://advertising.roku.com/learn/resources/acr-the-future-...

I wouldn't be surprised if my PS5 was doing the same thing when I'm playing a game or watching a streaming service through it.

Most likely case is that the tv is computing hash locally and sending the hash. Judging by my dnstap logs, roku TV maintains a steady ~0.1/second heartbeat to `scribe.logs.roku.com` with occasional pings to `captive.roku.com`. The rest are stragglers that are blocked by `*.roku.com` DNS blackhole. Another thing is `api.rokutime.com`, but as of writing it's a CNAME to one of `roku.com` subdomains.

The block rates seem to correlate with watch time increasing to ~1/second, so it's definitely trying to phone home with something. Too bad it can't since all its traffic going outside LAN is dropped with prejudice.

If your network allows to see stuff like that, look into what PS5 is trying to do.

  •   > Most likely ... sending the hash
    

    If you're tracking packets can't you tell by the data size? A 4k image is a lot more data than a hash.

    I do suspect you're right since they would want to reduce bandwidth, especially since residential upload speeds are slow but this is pretty close to verifiable, right?

    Also just curious, what happens if you block those requests? I can say Samsung TVs really don't like it... but they will be fine if you take them fully offline.

    • > If you're tracking packets can't you tell by the data size? A 4k image is a lot more data than a hash.

      I admit, I've not gotten around to properly dumping that traffic. For anyone wanting to do this, there's also a spike of DNS requests every hour on the hour, even if tv is off(well, asleep). Would be interesting to see those too. Might be a fun NY holiday project right there. Even without decrypting (hopefully) encrypted traffic, it should be verifiable.

      > Also just curious, what happens if you block those requests?

      Due to `*.roku.com` DNS black hole, roku showed no ads but things like Netflix and YouTube using standard roku apps("channels") worked fine. I now moved on to playing content using nvidia shield and blocking outside traffic completely. Only odd thing is that the TV occasionally keeps blinking and complains about lack of network if I misclick and start something except HDMI input.

  • Hashing might not work since the stream itself would be a variable bitrate, meaning the individual pixels would differ and therefore the computed file hash

    • They're using perceptual hashing, not cryptographic hashing of raw pixels. So it's invariant to variable bitrate, compression, etc.

      7 replies →

  • What system do you use to get that level of visibility?

    • Besides what others have said, another dead simple option is to use Nextdns: https://nextdns.io

      Doesn't require running anything locally and supports various block rules and lists and allows you to enable full log retention if you want. I recommend it to non-techies as the easiest way to get something like pi-hole/dnscrypt-proxy. (but of course not being self-hosted has downsides)

      edit: For Roku, DNS blocking like this only works if Roku doesn't use its own resolver. If it's like some Google devices it'll use 8.8.8.8 for DNS resolution ignoring your gateway/DHCP provided DNS server.

      6 replies →

    • Replace your router's DNS with something like pi-hole or a bog standard dnsmasq, turn up the logging, that's it. Ubiquiti devices I think also offer detailed DNS logging but not sure.

      1 reply →

    • My suggestion would be to configure your own router using a Linux distro. It's not as difficult as it sounds, the kernel already does most of the heavy lifting. All you need to really do is enable packet forwarding and configure the firewall using iptables rules (block all in, allow all out is a reasonable default). I use Unbound as my recursive DNS resolver, together with Hagezi's blacklists to provide DNS filtering. I filter ports 53 and 853, and filter by IP known public DNS servers (Hagezi maintains a list). DHCP is provided by the isc-dhcp-server package on Debian.

      That's a more or less complete home router, with plenty on spare resources to run internal or external services like a Wireguard tunnel, file server, or the Docker/Podman runtime.

      That being said, I still wouldn't connect a "smart" TV to the Internet. There are better options like a Linux HTPC.

    • Pfsense firewall. There is a week long learning curve and it’s best to put it on dedicated hardware.

  • I don’t know why you quoted the addresses.

    • It's polite to give parsers (human or otherwise) hints that they're about to encounter text which is now intended for a different kind of parser.

      I recently forgot to surround my code in ``` and Gemini refused to help with it (I think I tripped a safety guardrail, it thought I was targeting it with an injection attack). Amusingly, the two ways to work around it were to fence off my code with backticks or to just respond to:

      > I can't help you with that

      With

      > Why not?

      After which it was then willing to help with the unquoted code. Presumably it then perceived it as some kind of philosophical puzzle rather than an attack.

      2 replies →

    • Fair question, it does look a bit jarring when not rendered. I write a lot of markdown and it's a very strong force of habit to use backticks to sort of highlight a technical term and turn it into a noun. Similar to writing endash as a double hyphen.

      When I read what I write, my eyes glance through backticks and maybe come back if I need to parse the inner term in more detail.

    • Tell me you don't Markdown, without telling me you don't Markdown.

      It's a developer thing, using backticks means the enclosed text is emphasised when rendered from Markdown.

      5 replies →

That sounds so expensive it's hard to see it making money. You'd processing a 2fps video stream for each customer. That's a huge amount of data.

And all that is for the chance to occasionally detect that someone's seen an ad in the background of a stream? Do any platforms even let a streamer broadcast an NFL game like the example given?

  • I used to work for an OTT DSP adtech company i.e. a company that bid on TV ad spots in real time. The bidding platform was handling millions of requests per second, and we were one of the smaller fish in the sea. This system is very real. Your tv is watching what you’re watching. I built the attribution pipeline, which is what this is. If you go buy a product from one of these ads, this is how they track (attribute) it. Not to be alarmist butttt you have zero privacy.

    • The TV thing isn't a new story, this was public. Everyone should have known about it and no one cared. (I could inset a boilerplate rant about Snowden here)

      Those datacenters are not being built so that you can talk to ChatGPT all day, they are being built to generate and optimize ads. People who were not previously very suggestible are going to be. People who are suggestible will have their agency sold off to the highest bidder.

      Avoid owning a TV? Your friends will. Maybe you can not have a FB/IG/WhatsApp account, only use cash, not have a mobile phone, but Meta (or Google, or Apple) can still detect your face in the background of photos/videos and know where you shop, travel and when.

    • This is really interesting. Can you expand on this? What are OTT and DSP in this context?

      Do you have a sense for what data is tracked and how it's used? Or if this sort of system is blind in certain cases? (eg: I hook up an N64 to the a/v ports -- will I get retro game ads on the TV?)

      2 replies →

    • >Not to be alarmist butttt you have zero privacy.

      Hence why I will never connect my TV to the internet

    • Soooo.... Why did you build it for them? You didn't have to further enable it. Despise people who just drop this kind of thing without any hint of repentance or contrition.

    • Would love to know what are the best things we can do to prevent this sort of tracking in general. PiHole? Don't re-use emails? On a scale of 1 to fucked are we cooked?

  • I don't think they mean that kinda streamer - the idea is the roku tv can tell you're watching an ad even if it's on amazon prime, apple tv, youtube, twitch, wherever, and associate the ad watching with your roku account to potentially sell that data somehow?

    That way they aren't cut out of the loop by you using a different service to watch something and still have a 'cut'.

    • It'd make sense if they're using streamer in a different sense than I'm used to. I see that's at the bottom of the definitions Google will produce.

      1 reply →

  • The actual screenshot isn’t sent, some hash is generated from the screenshot and compared against a library of known screenshots of ads/shows/etc for similarity.

    Not super tough to pull off. I was experimenting with FAISS a while back and indexed screenshots of the entire Seinfeld series. I was able take an input screenshot (or Seinfeld meme, etc) and pinpoint the specific episode and approx timestamp it was from.

    • > The actual screenshot isn’t sent, some hash is generated from the screenshot and compared against a library of known screenshots of ads/shows/etc for similarity.

      this is most likely the case, although there's nothing stopping them from uploading the original 4K screengrab in cases where there's no match to something in their database which would allow them to manually ID the content and add a hash or just scrape it for whatever info they can add to your dossier.

      3 replies →

  • I assume these systems are calculating an on device perceptual hash. So not that much data needs get flown back to the mothership.

  • That's the thing about scaling; you offload the work to the "client" (the TV in this case) and make it do the work, it need not send back more than a simple identifier or string in an API call (of course they'll send more), so they get to use a little bit of your electricity and your TVs processing power to collect data on you and make money, with relatively little required from them, other than some infra to handle the requests, which they would have had anyway to collect the telemetry that makes them money.

    Client side processing like this is legitimate and an excellent way to scale, it just hits a little different when it's being used for something that isn't serving you, the user.

    source: backend developer

  • Confirming how many people actually seen the ad is worth big bucks. No one wants to pay for ads they cannot confirm and publisher can make up impressions - if you can catch publisher making up numbers you might get a huge discount or loads of money back.

  • Not necessarily, it can be done on-device, the screenshot hashed, and the results deduplicated and accumulated over time, then compressed and sent off in a neat package. It'd still be a huge amount of data when you add it all up, but not too different from the volume that e.g. web analytics produces.

    Then server-side the hash is matched to a program or ad and the data accumulated and reduced even further before ending up in someone's analytics dashboard.

  • Are there video "thumbprints" like exists for audio (used by soundhound/etc) - IE a compressed set of features that can reliably be linked in unique content? I would expect that is possible and a lot faster lookup for 2 frames a second. If this is the case, the "your device is taking a snapshot every 30 seconds" sounds a lot worse (not defending it - it's still something I hope can be legislated away - something can be bad and still exaggerated by media)

    • I've been led to believe those video thumbprints exist, but I know the hash of the perceived audio is often all that is needed for a match of what is currently being presented (movie, commercial advert, music-as-music-not-background, ...).

      1 reply →

  • Attribution is very painful and advertisers will pay lots of money to close that loop.

  • Is it? I don’t think you need particularly high fidelity to fingerprint ads/programs.

This is especially annoying and just incredibly creepy -- I was watching a clip of Smiling Friends on YouTube (via my Apple TV), and I suddenly got a banner telling me to watch this on HBO Max.

I never felt more motivated to pi-hole the TV.

It’s far less important for ad-free content. They mainly want to connect your ad watching behaviour to an email and then have loyalty program data connected to the same email so that they can identify which ads convert vs not.

  • It’s still a privacy violation a lot of people would be outraged by if they knew it. Tracking what shows you are watching is a valuable data set.

    • I'm surprised to see how few of my non-technical friends and family actually care about privacy.

    • It’s right there in your TV’s settings though. Personally, I don’t trust them to obey the setting so my TV has no internet and I use an Apple TV.

      2 replies →

So potentially completely noncompliant if used in a business. E.g. it may have HIPAA, top secret etc.

  • Sending 4k screenshots twice a second to a server would be tremendously bandwidth hungry. My guess is that it's all done locally.

    • There's probably compact signatures extracted from the screenshots (color profiles, OCR, etc) which are then uploaded later in bulk. You don't need the full original image to be able to reliably uniquely identify the content if you have an index of it already.

      2 replies →

  • It is a violation of the VPPA to collect this for streaming services and prerecorded media. Scheduled broadcast and cable TV aren't covered.

    • I thought the 2013 amendment to the VPPA largely defanged it by allowing sharing with customer consent (which is probably one of the clauses in the million-word customer agreement nobody reads).

      1 reply →

  • Yeah that’s why Webex is still in business. TVs are a great entry point to LANs.

The PS5 doesn't need to, they get it all in metadata because they control the full stack — TVs do it because they have less control over sources.

  • The PS5 does actually record video all the time in a ring buffer. That’s how when you press the share button, it includes a video of the recent past.

  • Is the PS5 not jailbroken?

    • I'm sure somebody's done it, but mine isn't. I do make sure to pull the microphones out of the controllers at least so while they can watch everything I'm doing on my screen they can't listen to the entire house.

I'd like to weaponize all this scanning into a force for good. Instead of phoning home to Roku, send the fingerprints up to an ADID database registering every ad on the planet. Open up an API so that any video stream can detect an ad and inject Max Headroom replacement clips.

Come on hackers. We could murder the global economy with this shit.

  • I've been thinking about this as well - make a small device that in real time detects ads and turns off audio an video while it's playing. I'd rather see a blank screen than an ad. That way, the whole ad pyramid scheme stays intact while the conversion rates plummet.

    • > while the conversion rates plummet.

      Isn't the segment who will set this up also likely to have a low conversion rate to begin with?

      You'd need to make it so easy that it becomes fully mainstream. I suspect that's what happened to adblockers, it got a bit too "standard" for (Google's) comfort.

    • Same here. I've done this for podcasts (not in real time) and it works great. TV should be easier in some ways since the video stream and captions can also indicate an ad.

      1 reply →

The only real question is whether they're doing screen-level analysis or just relying on app telemetry

  • They're definitely doing screen level analysis.

    I work for a company that does some work on Internet advertising and one of the main issues that came up when we discussed supporting smart TV platforms was how we could protect our proprietary advertising audience data while still showing ads on these devices. Knowing what ads we show the user tells them what the user is interested in, which is valuable information for our competitors.

    Unfortunately, we were not able to solve that problem, and instead to just use lower fidelity user models for advertising on smart TVs. That makes smart TV ads less valuable, but allows us to keep our competitive advantage on desktop and mobile.

  • If I’m understanding you right, I’m confident it’s screen analysis. I have a Hisense Roku TV I exclusively use with an AppleTV. I get creepy intrusive popups telling me: “you could be watching this on other streaming providers!” all the time. So it “knows” what’s being displayed on the screen regardless of what app (or HDMI input) is being used.

I'm fairly puzzled by my own reaction to this.

I'm indifferent to YouTube have frame-by-frame nanodata about me.

But as a Roku user, this snap shotting makes me very angry.

Maybe because much of what I watch on my TV via my Roku is content I own and stream from my personal server?

  • For me, I despise having different abstractions get crossed.

    I expect my media app, ie. YouTube, to know what I watch from the media app. YouTube knows about YouTube.

    My operating system, ie. Roku, should not know about what's happening inside a given app. ie. Roku does not know about YouTube.

    When they start crossing layers, that greatly upsets me.

Time for me to get Apple TV.

  • This is not sufficient because the TV you are showing the video on can (does/will) take the screencaps.

  • As if it didn’t track your habits as well.

    • ...it doesn't.

      Like, Apple knows what you're watching within the Apple TV app obviously.

      But it's certainly not taking screenshots every second of what it displaying when you use other apps -- which shows and ads you're seeing. Nor does Apple sell personal data.

      Other video apps do register what shows you're in the middle of, so they can appear on the top row of your home screen. But again, Apple's not selling that info.

      3 replies →

Does this apply for external video inputs, outside of the smart TV OS?

I guess I can always just refuse the TV OS access to the wifi, assuming they're not using 4G modems.

> > Roughly twice per second, a Roku TV captures video “snapshots” in 4K resolution.

Isn't that too much data to even begin to analyze? The only winner here seems like S3.

  • It runs a hashing algorithm locally, I believe, rather than transmitting the entire image. pHash or something similar would work.