Comment by embedding-shape

3 hours ago

> I don't really know but I don't think most people know it.

That's for sure, most people don't know how much they're being tracked, even if we consider only inside the platform. Nowadays, lots of platforms literally log your mouse movements inside the page, so they can see exactly where you first landed, how you moved around on the page, where you navigated, how long you paused for, and much much more. Basically, if it can be logged and re-constructed, it will be.

> Also a quick question but how long are the logs kept in OpenAI? And are the logs still taken even if you are in private mode?

As far as I know right now, OpenAI is under legal obligation to log all of their ChatGPT chats, regardless of their own policies, but this was a while ago (this summer sometime?), maybe it's different today.

What exactly you mean with "private mode"? If you mean "incognito/private window" in your browser, it has basically no impact on how much is logged by the platforms themselves, it's all about your local history.

For the "temporary mode" in ChatGPT, I also think it has no impact on how much they log, it's just about not making that particular chat visible in your chat history, and them not using that data for training their model. Besides that, all the tracking in your browser still works the same way, AFAIK.

Wow thanks for your response man.

I was referring to temporary mode when I was saying (but I also considered private window to be much safe as well but wow looks like they log literally everything)

So out of all providers, gemini,claude,openAI,grok and others? Do they all log everything permanently?

If they are logging everything, what prevents their logs from getting leaked or "accidentally" being used in training data?

> As far as I know right now, OpenAI is under legal obligation to log all of their ChatGPT chats, regardless of their own policies, but this was a while ago (this summer sometime?), maybe it's different today.

I also remember this post and from the current political environment, that's kind of crazy.

Also some of these services require a phone number one way or other and most likely there is a way the phone number can somehow be linked to logs, then since phone numbers are released by govt., usually chances are that if threat actors want data on large & OpenAI contributes to them, a very good profile of a person can be built if they use such services... Wild.

So if OpenAI"s under legal obligation, is there a limit for how long to keep the logs or are they gonna keep it permanently? I am gonna look for the old article from HN right now but if the answer is permanently, then its even more dystopian than I imagined.

The mouse sharing ability is wild too. I might use librewolf at this point to prevent some of such tracking

Also what are your thoughts on the new anonymous providers like confer.to (by signal creator), venice.ai etc.? (maybe some openrouter providers?)

  • You can safely assume (and probably better you do regardless) that everyone on the internet is logging and slurping up as much data as they can about their users. Their product teams usually is the one who is using the data, but depending on the amount of controls in the company, could be that most of it sits in a database both engineering, marketing and product team has access to.

    > If they are logging everything, what prevents their logs from getting leaked or "accidentally" being used in training data?

    The "tracking data" is different from "chat data", the tracking data is usually collected for the product team to make decisions with, and automatically collected in the frontend and backend based on various methods.

    The "chat data" is something that they'd keep more secret and guarded typically, probably random engineers won't be able to just access this data, although seniors in the infrastructure team typically would be able to.

    As for easy or not that data could slip into training data, I'm not sure, but I'd expect just the fear of big name's suing them could be enough for them to be really careful with it. I guess that's my hope at least.

    I don't know any specific "how long they keep logs" or anything like that, but what I do know, is that typically you try to sit on your data for as long as you can, because you always end up finding new uses for it in the future. Maybe you wanna compare how users used the platform in 2022 vs 2033, and then you'd be glad, so unless the company has some explicit public policy about it, assume they sit on it "forever".

    > Also what are your thoughts on the new anonymous providers like confer.to (by signal creator), venice.ai etc.? (maybe some openrouter providers?)

    Haven't heard about any of them :/ This summer I took it one step further and got myself the beefiest GPU I could reasonably get (for unrelated purposes) and started using local models for everything I do with LLMs.

    • > I don't know any specific "how long they keep logs" or anything like that, but what I do know, is that typically you try to sit on your data for as long as you can, because you always end up finding new uses for it in the future. Maybe you wanna compare how users used the platform in 2022 vs 2033, and then you'd be glad, so unless the company has some explicit public policy about it, assume they sit on it "forever".

      I am gonna assume in this case that the answer is forever.

      I actually looked at kagi assistant for the purposes of this as someone mentioned and created a free kagi account but looks like that they are using AI models api themselves and the logs which come with that. Wouldn't consider it the most private (although like bedrock and aws says that they provide logs for 30 days but still :/ I feel like there is still a genuine issue )

      I don't want to buy a gpu for my use case too though being honest :/

      Either I am personally liking the proton lumo models or confer.to (I can't use confer.to on my mac for some reason so proton lumo it is)

      I am probably gonna be right on proton lumo + kagi assistant/z.ai (with GLM 4.7 which is crazy good model)

      I am really gpu poor (just got a simple mac air m1) but I ran some liquidFM model iirc and it was good for some extremely basic tasks but it fumbled at when I asked it the capital of bhutan just out of curiosity