Comment by londons_explore
5 years ago
The main way is via the Googlebot crawler.
They also use user reports from Chrome, and links in "mark phishing" emails from Gmail. Those latter two cases the URL is considered private data, so won't be reported in webmaster tools.
We’ve seen internal firewalled URLs in the webmaster tools, so I’m not sure the private data works as intended.
I've seen some bot of Google's in the server logs on my in-construction not-publicly-available page, a minute after I opened the page in Chrome. That was about five of six years ago, shortly before I stopped using Chrome.
Maybe there is some kind of "if multiple users see the same URL, it isn't private" logic going on.