← Back to context

Comment by matthewdgreen

19 hours ago

It's not at all clear how easy it is to obtain a user's search history, when users don't explicitly log in to those services (e.g., incognito/Private browsing), and don't keep history on their local device. I've been trying to find a single example of a court case where this happened, and my Google/ChatGPT searches are coming up completely empty. Tell me if you can find one.

The closest I can find is "keyword warrants" where police ask for users who searched on a given term, but that's not quite the same thing as an exhaustive search history.

Certainly my personal intuition is that historically there has been a lot of default privacy for non-logged in "incognito" web search, which used to be most search -- and is also I think why we came to trust search so much. I expect that will change going forward, and most LLMs require user logins right from the jump.

As far as the "I can see no reason" why LLMs should be treated differently than email, well, there are plenty of good reasons why we should. If you're saying "we can't change the law," you clearly aren't paying attention to how the law has been changing around tech priorities like cryptocurrency recently. AI is an even bigger priority, so a lot of opportunity for big legal changes. Now's the time to make proposals.

> [..] single example of a court case where this happened, and my Google/ChatGPT searches are coming up completely empty.

A massive amount, part of why I am both surprised and starting to feel like this discussion stems from some being unaware of the tracking they tolerated for decades. These have been discussed to no end, covered by the usual suspects like the EFF and constantly get (re)reported across the media in "Incognito mode is not incognito" pieces.

Heck, some I know from memory [0], the rest one could find with a simple ten sec search [1].

> [...] my personal intuition is that historically there has been a lot of default privacy for non-logged in "incognito" web search [...]

There has not. No need for intuition or to believe me, just read the privacy information Google provides [2] whenever you access their sites (whether in an incognito instance or otherwise) as part of the cookie banner (and in the decade beforehand if one looked for it).

> As far as the "I can see no reason" why LLMs should be treated differently than email, well, there are plenty of good reasons why we should.

Not email. Never said email. If you are going to use quotation marks, please quote accurately ("I still see no reason presented why LLM input should be treated any differently to cloud hosted files or web search requests." is what I wrote and means something very different), I do the same to you.

Neither you, nor anyone else has provided a reason why LLM input is inherently different to other files hosted online. Happy to read those "plenty of good reasons", but they have yet to be shared.

> If you're saying "we can't change the law," [...]

I did not. I asked why existing laws should be applied differently in case of LLM input and/or changes are somehow needed for LLMs specifically or suddenly.

This really seems like LLMs, because they can be anthropomorphized, "feel" different to some and that somehow warrants different treatment, when that is an illusion.

Considering your believe that "historically there has been a lot of default privacy for non-logged in "incognito" web search", it honestly sounds like you believe there is less room for stricter regulation than my long immersed in this topic self, if I am being fully honest.

If I could implement any change, I would start with more consistent and transparent information of users at all times, which might dispel some misconceptions and help users make more informed decisions, even if they don't read the privacy policy.

Always liked a traffic light system as a concept. Then again, that is what Chrome already tells you when opening incognito mode and somehow there still seem to be assumptions that are not accurate about what that actually does and doesn't do.

TL;DR:

Yes, Search engine providers are able to identify users in incognito mode. Such tracking has always been public information, not least because they have to include it in their privacy policy.

Yes, such tracking has been used in court cases, in the US and elsewhere, to identify users and link them to their search requests done whilst using such modes.

No, LLM input is no different to search requests or files hosted online. Or at least, no one has said why LLM input is different, happy to hear arguments to the contrary though.

[0] https://www.classaction.org/media/brown-et-al-v-google-llc-e... (Google was forced to remediate billions (yes, with a b) of “Incognito” browsing records which according to plaintiffs precisely identified users at the time including being able to link them to their existing, not logged in, Google accounts. Note that this is one of two (US specific) cases I knew of the top of my head, the other was the Walshe murder, though there is no (public) information on whether incognito was used in that case: https://www.youtube.com/watch?v=cnA6XwVQUHY)

[1] https://law.justia.com/cases/colorado/supreme-court/2023/23s... and https://www.documentcloud.org/documents/23794040-j-s10032-22...

[2] https://policies.google.com/privacy?hl=en-US ("When you’re not signed in [...] we store the information we collect with unique identifiers tied to the browser, application, or device you’re using.", "This information is collected regardless of which browser or browser mode you use [...] third party sites and apps that integrate our services may still share information with Google.", I think you get the point. There never was any "default privacy for non-logged in "incognito" web search" and I can assure you, that data has always been more than sufficient to fingerprint a unique user)

  • I was retained as an expert witness in some of the cases involving Google, so of course I’m aware that Google keeps logs. (In general on HN I’ve found it’s always helpful to assume the person you’re arguing with might be a domain expert on the topic you’re arguing about; it’s saved me some time in the past.)

    But Google’s internal logging is not the question I’m asking. I’m saying: can you find a single criminal case in the literature where police caused Google to disgorge a complete browsing history on someone who took even modest steps not to record it (ie browsed logged out.) Other than keyword search warrants, there doesn’t seem to be much. This really surprised me, since as an expert I “know” that Google has enough internal data to reconstruct this information. Yet from the outside — the experience that matters to people - they’ve managed to operate a product where real-world privacy expectations have been pretty high if you take even modest steps. I think this is where we get many of our privacy expectations from: the actual real-world lived expectations of privacy are much closer to what we want than what’s theoretically possible, or what will be possible in a future LLM-enabled surveilled world.