Comment by bichiliad
8 years ago
@homero (since I can't nest a reply any further), it's not the contents of the crawler's request that gets randomly injected into the page that the crawler requests, but rather the contents of other requests to the same Cloudflare server.
Imagine I'm having a chat on some website X, which uses Cloudflare. Cloudflare acts as a man in the middle, meaning my request, and the response, likely pass through its memory at some point to allow me to communicate with X.
Later, a Google bot comes along and requests a page from site Y. Because of this bug, random bits of memory that were left around on the Cloudflare server get inserted into the response to the bot's request. Those bits of memory could be from anything that's gone through that server in the past, including my conversations on website X. The bot then assumes that the content that Cloudflare spits out for website Y is an accurate representation of website Y's contents, and it caches those contents. In this way, my data from website X ends up in Google's cached version of website Y.
No comments yet
Contribute on Hacker News ↗