Comment by kinkrtyavimoodh

8 years ago

Imagine this—Google sends a request to get data from malformedhtml.com for crawling purposes. This site's html happens to have that weird incomplete tag problem they mentioned. This site is served by Cloudflare, wherein a buggy script manages to insert some data from the server's memory into the HTML that it returns to Google. Now this data in the memory contains HTTP request headers etc. of _completely unrelated websites_ that are also behind CF.

Google gets this HTML and caches it and that's how it ends up there.