← Back to context

Comment by andybak

1 month ago

Users of my (free, open-source) app seem surprised to learn that we've got zero insight into usage patterns. There are situations where a small amount of anonymous telemetry would be extremely helpful but I'm not going to touch it with a barge-pole.

Opt-in makes the data useless - not just in terms of the huge drop in quantity but because of the fact it introduces a huge bias in the data selected - the people that would opt-in are probably not a good sample of "typical users".

Opt-out - no matter what safeguards or assurances I could provide is unacceptable to a subset of users and they will forcefully communicate this to you.

Don't get me wrong - I understand both the ease at which bad actors abuse telemetry and the ease in which "anonymous data" can prove to be nothing of the kind in a multitude of surprising ways.

But it's hard not to feel a little sad in a "this is why we can't have nice things" kind of way.

I can't remember where I saw this before. However, there was a site that collected analytics data client side in a circular buffer (or something), and there was a menu in the settings to send it back one-time or always, or download it yourself. If you experienced an error, they would pop up in a toast to share the analytics data with them so they could help fix the problem. You could, of course, decline.

That was probably the best system I'd seen, but I can't remember what site it was.

  • On macos (maybe tiger or leopard era), apple used to pop up a crash dialog, with a "send to apple?" prompt. And you could say no.

    they did away with that.

  • I built the same for my browser extension (effectively dead product) -- would love to see if this pattern has a name so I can share it more widely!

  • Maybe the Datadog Flare works like this?

    • The first time I used a flare with their support agents, it truly felt like magic. It's such a clever way to perform data collection for a specific, imperative need without doing a dragnet of constant use telemetry (as far as I'm aware)

Consent is the key issue binding all. There is complete lack of consent when there is no opt-out and great degradation when the default is opt-out. Trust is the only means to consent.

1) Opt-in, Opt-survey, Opt-out is the only ternary to build trust. Survey is an active validator of trust and assists in low-bandwith communication. Question should be presented to the end user the first time using it or the next time the application starts and this feature was added.

2) Provide the exact analytical information you want to the end user so they can parse it too. The means to self-evaluate allowed information to be shared with providing the reports or views improves trust.

3) Known privilege to trust leads to more consent. Having priority support with features and bugs could be aligned with those that Opt-in. Analytical history / performance may assisting in solving the recent bug that was reporter.

Apple, Microsoft, Google, and all apply ambiguity to their analytical sharing without details, not how they use it and can abuse it. Most don't even provide an Opt-out. I don't trust these organizations but I must engage with them through my life. I don't have to use Facebook or Twitter and don't. I accept the Steam survey.

RFC with an agreed upon analytical standard could be step to solving the latch of analytical information the open source community would benefit from. Both parties consenting to agreed upon communication.

*My Point of View; meta data is still personal data. Without the user the data and the meta data would not existing. Since the end user is the entropy to meta data they own the meta and the data.

  • Yes - I understand but in many (or even most) cases, opt-in makes the data worthless. There's literally no point collecting it.

    • Building and growing trust makes the data less worthless to the point of being useful. More people will opt-in when they trust the company / the developer(s). Opt-in without a push, universally trust building in the community, keeps leading to this worthless data.

      The only way I see moving forward would be community driven effort to build the trust through said means and or other ideas. This not an easy problem to solve and would take time.

      *Even the USA agencies like the CDC and FBI must utilize bias data for the decision making since not all states and organizations self-report.

Would there be a way to do the stats gathering on device, then once every few months send a popup with statistics?

Not sure what bias it adds

Like

"hey, we make this app, and we care about privacy, here is the information we have gathered over your usage for the past month, can we send this to ourselves, so that we can use it to improve the app?"

And then show human readable form of what data was collected.

  • Just as a reference of existing implementations of this: This is essentially how Valve/Steam collects hardware details from users/clients. Every now and then, a popup appears asking the user if they'd like to participate in the "Hardware Survey", together with all the data that would be submitted if they accept.

    Seems to me like a great implementation.

  • The podcast app I use, AntennaPod (far better for me than other apps, available on F-Droid, no affiliation!) just gave me a local-only year in review. I thought it was a great touch, and would be happy to have then shared the data from that with the app's makers.

  • You'd still have extremely biased data - people who blindly click OK on every pop up are not representative of your typical user; people who get nightmares after hearing the word "telemetry" and will gather the pitchforks if they hear any hint of will always refuse, but depending on your app, might be your typical user (e.g. for self-hosted picture sync and catalogue, who is the target audience - people who don't trust Apple/Google/Amazon/Dropbox to store their images privately)

    • I do find myself on the “private first” side…but also keep in mind that those who grab for pitchforks in defense of privacy aren’t a representative sample of the typical user either. (A purely statistical statement).

      It’s very easy to confuse ‘loud protest from a small minority’ and the majority opinion. If a plurality of users chose to participate in an analytics program when asked and don’t care to protest phone-home activities when they’re discovered, then that’s where the majority opinion likely lies.

    • > people who blindly click OK on every pop up are not representative of your typical user

      You could unbias the data by including the metric determining how long did it took them to click "Ok" and whether they actually reviewed the data before agreeing.

  • This sort of sounds like the Steam Hardware Survey. They do not collect the data willy-nilly, they ask you every few months if you want to participate in a one-time check.

    I have an incentive to see if the Linux desktop share has increased, so I usually run the survey for them to get my data point in. I also suppose the "gamer" crowed likes to show off how powerful their "rig" is, so I would imagine they commonly also run the survey for that reason as well.

> Opt-in makes the data useless - not just in terms of the huge drop in quantity but because of the fact it introduces a huge bias in the data selected - the people that would opt-in are probably not a good sample of "typical users".

Why? I don't think that's obvious. It may also be related to the way the opt-in is presented. In general, I would expect this to be a workable solution. Even if the opt-in group deviates from the "typical user", it's the best data you can get in an honest and ethically sound way. This should certainly be better than no data at all?

For any website/app that presents an opt-in cookie consent banner this is implicitly already the case.

Yes, this is one of the main reasons people mostly build on web. It's very difficult to make desktop software better, and especially Linux users are hostile to patterns that would make improvements possible

>Opt-in makes the data useless

Hardly. It just has some issues with regards to what you also pointed out, bias for one. But it still provides valuable insight into usage patterns, systemic issues, and enables tracking effects of developments over time. Correcting the bias is not a bigger task than it is now - I'm sure you already have an idea about feedback to different features according to reviews, user reports, discussions, and so on. Opt-in is the same, just much better.

Maybe the solution lies in radical transparency: explaining exactly how and why telemetry would help, then letting users decide. But even that requires trust...

Is there a Github API for creating issues? I also maintain a free, open-source app and would love to make it easy for a crash to give users a button that opens a Github issues form--allowing users to see what crash data is populated and submit it if they want.

Data collection and telemetry is sadly lemon market type of situation. The most trustworthy developers are precisely the ones who don't collect data from users

This can only ever be opt-in if you want to stay on the legal side of the GDPR (and equivalents in other jurisdictions). You can ask, but the default needs to be "no" if no answer is given.

I provide telemetry data to KDE, because they default to collecting none, and KDE is an open-source and transparent project that I'd like to help if I can. If I used your app, I would be likely to click yes, since it's open-source. Part of the problem I have with projects collecting user data is the dark patterns used or the illegal opt-out mechanism, which will make me decline sending telemetry every time, or even make me ditch it for an alternative. An app that asks:

    Can we collect some anonymized data in order to improve the app?
    [Yes] [No]

...with equal weight given to both options, is much more likely to have me click Yes if none of the buttons are big and blue whilst the other choice is in a smaller font and "tucked away" underneath the other (or worse, in a corner or hidden behind a sub-menu).

Plus, I would think that SOME data would be better than NO data, even if there's an inherent bias leaning towards privacy-minded/power users.

  • > This can only ever be opt-in if you want to stay on the legal side of the GDPR

    The GDPR only applies to personal data. You can collect things like performance data without opt-in (or even an opt-out option) as long as you are careful to not collect any data that can be used to identify an individual, so no unique device IDs or anything like that. Of course, you should be transparent about what you collect. You also have to be careful about combinations of data points that may be innocuous on their own but can be used to identify a person when combined with other data points.