Comment by matthewdgreen

20 hours ago

A simple answer to this is: I use local storage or end-to-end encrypted cloud backup for private stuff, and I don't for work stuff. And I make those decisions on a document-by-document basis, since I have the choice of using both technologies.

The question you are asking is: should I approach my daily search tasks with the same degree of thoughtfulness and caution that I do with my document storage choices, and do I have the same options? And the answers I would give are:

* As a consumer I don't want to have to think about this. I want to be able to answer some private questions or have conversations with a trusted confidant without those conversations being logged to my identity.

* As an OpenAI executive, I would also probably not want my users to have to think about this risk, since a lot of the future value in AI assistants is the knowledge that you can trust them like members of your family. If OpenAI can't provide that, something else will.

* As a member of a society, I really do not love the idea that we're using legal standards developed for 1990s email to protect citizens from privacy violations involving technologies that can think and even testify against you.

> [...] should I approach my daily search tasks with the same degree of thoughtfulness and caution that I do with my document storage choices [...]

Then treat them with the same degree of thoughtfulness and caution you have treated web searches on Google, Bing, DuckDuckGo or Kagi for the last decade.

Again, there is no confidant or entity here, no more so than the search algorithms we have been using for decades are at least.

> I really do not love the idea that we're using legal standards developed for 1990s email to protect citizens [...]

Fair, but again, that is in no way connected to LLMs. I still see no reason presented why LLM input should be treated any differently to cloud hosted files or web search requests.

You want better privacy? Me too, but that is not in any way connected to or changed by LLMs being common place. Same logic I find any attempt to restrict a specific social media company for privacy and algorithmic concerns laughable, if the laws remain so that any local competitors are allowed to do the same invasions.

  • It's not at all clear how easy it is to obtain a user's search history, when users don't explicitly log in to those services (e.g., incognito/Private browsing), and don't keep history on their local device. I've been trying to find a single example of a court case where this happened, and my Google/ChatGPT searches are coming up completely empty. Tell me if you can find one.

    The closest I can find is "keyword warrants" where police ask for users who searched on a given term, but that's not quite the same thing as an exhaustive search history.

    Certainly my personal intuition is that historically there has been a lot of default privacy for non-logged in "incognito" web search, which used to be most search -- and is also I think why we came to trust search so much. I expect that will change going forward, and most LLMs require user logins right from the jump.

    As far as the "I can see no reason" why LLMs should be treated differently than email, well, there are plenty of good reasons why we should. If you're saying "we can't change the law," you clearly aren't paying attention to how the law has been changing around tech priorities like cryptocurrency recently. AI is an even bigger priority, so a lot of opportunity for big legal changes. Now's the time to make proposals.

    • > [..] single example of a court case where this happened, and my Google/ChatGPT searches are coming up completely empty.

      A massive amount, part of why I am both surprised and starting to feel like this discussion stems from some being unaware of the tracking they tolerated for decades. These have been discussed to no end, covered by the usual suspects like the EFF and constantly get (re)reported across the media in "Incognito mode is not incognito" pieces.

      Heck, some I know from memory [0], the rest one could find with a simple ten sec search [1].

      > [...] my personal intuition is that historically there has been a lot of default privacy for non-logged in "incognito" web search [...]

      There has not. No need for intuition or to believe me, just read the privacy information Google provides [2] whenever you access their sites (whether in an incognito instance or otherwise) as part of the cookie banner (and in the decade beforehand if one looked for it).

      > As far as the "I can see no reason" why LLMs should be treated differently than email, well, there are plenty of good reasons why we should.

      Not email. Never said email. If you are going to use quotation marks, please quote accurately ("I still see no reason presented why LLM input should be treated any differently to cloud hosted files or web search requests." is what I wrote and means something very different), I do the same to you.

      Neither you, nor anyone else has provided a reason why LLM input is inherently different to other files hosted online. Happy to read those "plenty of good reasons", but they have yet to be shared.

      > If you're saying "we can't change the law," [...]

      I did not. I asked why existing laws should be applied differently in case of LLM input and/or changes are somehow needed for LLMs specifically or suddenly.

      This really seems like LLMs, because they can be anthropomorphized, "feel" different to some and that somehow warrants different treatment, when that is an illusion.

      Considering your believe that "historically there has been a lot of default privacy for non-logged in "incognito" web search", it honestly sounds like you believe there is less room for stricter regulation than my long immersed in this topic self, if I am being fully honest.

      If I could implement any change, I would start with more consistent and transparent information of users at all times, which might dispel some misconceptions and help users make more informed decisions, even if they don't read the privacy policy.

      Always liked a traffic light system as a concept. Then again, that is what Chrome already tells you when opening incognito mode and somehow there still seem to be assumptions that are not accurate about what that actually does and doesn't do.

      TL;DR:

      Yes, Search engine providers are able to identify users in incognito mode. Such tracking has always been public information, not least because they have to include it in their privacy policy.

      Yes, such tracking has been used in court cases, in the US and elsewhere, to identify users and link them to their search requests done whilst using such modes.

      No, LLM input is no different to search requests or files hosted online. Or at least, no one has said why LLM input is different, happy to hear arguments to the contrary though.

      [0] https://www.classaction.org/media/brown-et-al-v-google-llc-e... (Google was forced to remediate billions (yes, with a b) of “Incognito” browsing records which according to plaintiffs precisely identified users at the time including being able to link them to their existing, not logged in, Google accounts. Note that this is one of two (US specific) cases I knew of the top of my head, the other was the Walshe murder, though there is no (public) information on whether incognito was used in that case: https://www.youtube.com/watch?v=cnA6XwVQUHY)

      [1] https://law.justia.com/cases/colorado/supreme-court/2023/23s... and https://www.documentcloud.org/documents/23794040-j-s10032-22...

      [2] https://policies.google.com/privacy?hl=en-US ("When you’re not signed in [...] we store the information we collect with unique identifiers tied to the browser, application, or device you’re using.", "This information is collected regardless of which browser or browser mode you use [...] third party sites and apps that integrate our services may still share information with Google.", I think you get the point. There never was any "default privacy for non-logged in "incognito" web search" and I can assure you, that data has always been more than sufficient to fingerprint a unique user)

      1 reply →