Comment by klik99
6 days ago
Are there video "thumbprints" like exists for audio (used by soundhound/etc) - IE a compressed set of features that can reliably be linked in unique content? I would expect that is possible and a lot faster lookup for 2 frames a second. If this is the case, the "your device is taking a snapshot every 30 seconds" sounds a lot worse (not defending it - it's still something I hope can be legislated away - something can be bad and still exaggerated by media)
There are perceptual hashing algorithms for images/video/audio (dsp and ML based) that could work for that.
Given that the TV is trying to match one digital frame against another digital frame, you could probably get decent results even with something super naive like downsampling to a very low resolution, quantizing the color palette, then looking for a pixel for pixel match.
All this could be done long before any sort of TV-specific image processing, so the only source of "noise" I can think of would be from the various encodings offered by the streaming service (e.g. different resolutions and bitrates). With the right choice of downsample resolution and color quantization I have to imagine you could get acceptable results.
That's basically what phash does
I've been led to believe those video thumbprints exist, but I know the hash of the perceived audio is often all that is needed for a match of what is currently being presented (movie, commercial advert, music-as-music-not-background, ...).
This is why a lot of series uploaded to YouTube will be sped up, slowed down, or have their audio’s pitch changed; if the uploader doesn’t do this, it gets recognized by YouTube as infringing content.