← Back to context

Comment by cphoover

19 hours ago

How many people are giving an LLM Agent full read access to their production data? That seems nuts to me.

Evan here, from Ardent.

It's not uncommon (hex.ai, etc all do this, as do developers, MCP tools, etc). One thing we do at Ardent is enable obfuscated read replicas. We can strip PII in the replicas, so your agents are operating on realistic (but not sensitive) data. Moreover, they can do so in a way that doesn't impact your production database and is fast enough to wire into your CI/CD processes.

Jeremy is correct, though. The main risk/concern is primarily agents with write access. There are two high profile instances in the last year of agents dropping production databases (even when, in one case, after being given explicit instructions to never do such a thing). While read-replicas of a primary DB solve the "agents can't destroy things" problem, they don't solve things like testing schema migrations (in particular) or updates to the data.

Evan here, from Ardent.

It's not uncommon (Hex.ai, etc all do this, as do developers, MCP tools, etc). One thing we do at Ardent is enable obfuscated read replicas. We can strip PII in the replicas, so your agents are operating on very realistic (but not sensitive) data. Moreover, they can do so in a way that doesn't impact your production database and is fast enough to wire into your CI/CD processes.

Jeremy is correct, though. The main risk/concern is primarily agents with write access. There are two high profile instances in the last year of agents dropping production databases (even when, in one case, after being given explicit instructions to never do such a thing). While read-replicas of a primary DB solve the "agents can't destroy things" problem, they don't solve things like testing schema migrations (in particular) or updates to the data.

Business side people install Claude, find it fantastic, read about postgres and BigQuery MCP, and immediately demand it.

Small enough company without suitable MoC and they've got a real chance of getting it.

I'm much more worried about people who give full write access to their agents! But at least this solves that problem.

  • Jedberg... Wow an internet legend replied to me! ><

    > I'm much more worried about people who give full write access to their agents! But at least this solves that problem.

    Yeah it goes without saying that write access would be crazy... But, it seems like people don't really care about the fact that they are just giving their private data to companies like Anthropic, OpenAI and Google.

    > Branch anonymization Branches default to a full copy of your production data.

    <-- This doesn't seem a safe default to me...

    Perhaps a data policy should be required to be in place before a branch can be cloned... The default configuration giving the LLM full prod data access by default, is a bad standard to set, I think.

    • > Jedberg... Wow an internet legend replied to me!

      Hey, I put on my pants the same way you do: by having my staff hold them up while I jump into them.

      > But, it seems like people don't really care about the fact that they are just giving their private data to companies like Anthropic/Open AI and Google.

      This isn't quite as risky as it seems. All of them have a TOS that says if you pay them enough money they won't train on your data. But you're right that there are probably a lot of people who aren't on those plans sharing private data.

      > > Branch anonymization Branches default to a full copy of your production data. > <-- This doesn't seem a safe default to me...

      Agreed, and I'm sure it will cause trouble if you don't also bring along with the copies the internal controls around access logging.

      But also, for smaller companies, this isn't an issue since they don't have SOC2 and the other compliance needs yet. So it's probably a sane starting place for Ardent at this time. Most small startups let everyone in the company access the full database anyway.

      > Perhaps a data policy should be required to be in place before a branch can be cloned... The default configuration giving the LLM full prod data access by default, is a bad standard to set, I think.

      Or at least an easy way to copy it from the database you're branching from.

      1 reply →