← Back to context

Comment by jameshart

2 days ago

Counterpoint: you should not be connecting to your production database, you should not be running non-critical queries on production database servers, and you probably shouldn’t have permission to see all this data about your users.

Obviously your mileage may vary, your scale is your own, your trade offs are your own trade offs.

But be aware that there comes an operational scale where this is not an acceptable way - operationally, legally, privacy-wise - to investigate customer issues, and you’ll need different tricks.

Counter-counter point: for most databases that “operational scale” will never come.

  • Scale isn’t just measured in transactions per second - it’s also measured in dollars, and compliance risk, and legal exposure.

https://www.xkcd.com/1737

If you're OBVIOUSLY not the target audience you don't have to dismiss it because it doesn't fit your usecase. There's probably a thousand "apps" where this is just fine for every one "Sry we work with the government or are planet scale apps" you're talking about.

It's exhausting to read dismissive online dick-measuring comments, if you have the issues you're explaining you already know this doesn't apply to you. It's on the same level as "Bro I asked a question to an LLM and it gave an interesting answer and I'm unique because nobody but me can ask questions to LLMs like I can" style posts.

  • I don’t think I was being dismissive, I was just pointing out the lack of universal applicability of this suggestion.

    It is my experience that many people do not realize that it is possible not to have developers just connect to prod databases with admin privs.

    Pointing out that there comes a point where this sort of approach isn’t the norm is part of how people who reach that level of scale learn that. https://xkcd.com/1053/

    And that level of concern isn’t reserved for planet-scale - once you have a couple of million dollar contracts on your B2B SaaS platform you should be taking production data ops seriously enough that this sort of approach is unlikely to make sense.

    And I shouldn’t need to say that user privacy ought to be a concern even for small operations.

    • > It is my experience that many people do not realize that it is possible not to have developers just connect to prod databases with admin privs.

      Dismissive, everyone knows this but they probably can't be arsed/don't care

      > Pointing out that there comes a point where this sort of approach isn’t the norm is part of how people who reach that level of scale learn that. https://xkcd.com/1053/

      Not everyone has these ambitions

      > And that level of concern isn’t reserved for planet-scale - once you have a couple of million dollar contracts on your B2B SaaS platform you should be taking production data ops seriously enough that this sort of approach is unlikely to make sense.

      Sure, but you're talking about "seriousness" with the same dismissive "I'm better" tone here again, your usecase and the business you work for doesn't reflect what everyone else is doing

      > And I shouldn’t need to say that user privacy ought to be a concern even for small operations.

      Depends a lot on what PII you're collecting. But rather than stating "You shouldn't collect PII you don't need" since I don't know your usecase I'll say "I try to minimize the PII I collect so I don't have to deal with these issues yet".

What are the tricks for investigating customers' data that don't violate privacy?

  • IANAL but two ideas come to mind:

    1) What I do for my small app is make a copy of the prod database and randomize nearly all the data. All the PII, phone numbers, email addresses, names, etc. All the relationships between the data are preserved so I can usually still repro whatever issue. I don't know if this would satisfy the lawyercats but I think it's a decent start.

    2) If I had more time/money I'd build a specialized "Customer Support" app that gives limited access to customer data. Customer would have to provide consent before support worker could access their data, and this would be logged/audited. No one would have direct access to the prod DB.

    • 1) We regularly get given bad data that requires us to debug. Financial transactions whose description and/or dates change, unique Ids are sometimes present and sometimes missing.

      2) We have those - support app for the support team, and when they kick it up to us backend devs, we also have our own tools to try to debug. No idea if they're correct or not. I'd need to compare the output of the tool to the prod db to verify. Furthermore we can do all the spying/privacy violating we want with the debug tools. We just can't debug when things go wrong.

This is not unreasonable, and will be standard practice at many large companies. Judging from downvotes some of us are just too cool for that.

I connect to a production replica read-only. Many coworkers aren't even allowed that. Any DDL change has to go thru reviews & approvals etc. and is too much trouble, so I just keep a set of queries in git.

Also, any defined view in the db becomes a dependency that people are scared of breaking because who is using it and for what? This becomes especially true when random things connect to your db (bad but too late to do anything about it...) Now you can't change it without everyone yelling at you, and worse yet, a necessary data migration that affects such views means you have to fold them into your data migration project.

Working on giant corporate legacy systems is painful but it pays the bills...