Comment by embedding-shape

18 days ago

You can safely assume (and probably better you do regardless) that everyone on the internet is logging and slurping up as much data as they can about their users. Their product teams usually is the one who is using the data, but depending on the amount of controls in the company, could be that most of it sits in a database both engineering, marketing and product team has access to.

> If they are logging everything, what prevents their logs from getting leaked or "accidentally" being used in training data?

The "tracking data" is different from "chat data", the tracking data is usually collected for the product team to make decisions with, and automatically collected in the frontend and backend based on various methods.

The "chat data" is something that they'd keep more secret and guarded typically, probably random engineers won't be able to just access this data, although seniors in the infrastructure team typically would be able to.

As for easy or not that data could slip into training data, I'm not sure, but I'd expect just the fear of big name's suing them could be enough for them to be really careful with it. I guess that's my hope at least.

I don't know any specific "how long they keep logs" or anything like that, but what I do know, is that typically you try to sit on your data for as long as you can, because you always end up finding new uses for it in the future. Maybe you wanna compare how users used the platform in 2022 vs 2033, and then you'd be glad, so unless the company has some explicit public policy about it, assume they sit on it "forever".

> Also what are your thoughts on the new anonymous providers like confer.to (by signal creator), venice.ai etc.? (maybe some openrouter providers?)

Haven't heard about any of them :/ This summer I took it one step further and got myself the beefiest GPU I could reasonably get (for unrelated purposes) and started using local models for everything I do with LLMs.

1 comment

embedding-shape

Imustaskforhelp 18 days ago

> I don't know any specific "how long they keep logs" or anything like that, but what I do know, is that typically you try to sit on your data for as long as you can, because you always end up finding new uses for it in the future. Maybe you wanna compare how users used the platform in 2022 vs 2033, and then you'd be glad, so unless the company has some explicit public policy about it, assume they sit on it "forever".

I am gonna assume in this case that the answer is forever.

I actually looked at kagi assistant for the purposes of this as someone mentioned and created a free kagi account but looks like that they are using AI models api themselves and the logs which come with that. Wouldn't consider it the most private (although like bedrock and aws says that they provide logs for 30 days but still :/ I feel like there is still a genuine issue )

I don't want to buy a gpu for my use case too though being honest :/

Either I am personally liking the proton lumo models or confer.to (I can't use confer.to on my mac for some reason so proton lumo it is)

I am probably gonna be right on proton lumo + kagi assistant/z.ai (with GLM 4.7 which is crazy good model)

I am really gpu poor (just got a simple mac air m1) but I ran some liquidFM model iirc and it was good for some extremely basic tasks but it fumbled at when I asked it the capital of bhutan just out of curiosity