← Back to context

Comment by apothegm

3 days ago

And also something that it’s dangerous to try to do stochastically.

It's going to be stochastic in some sense whether you want it to be or not, human error never reaches zero percent. I would bet you a penny you'd get better results doing one two-second automated pass + your usual PII redaction than your PII redaction alone.

  • The advantage of computers was that they didn't make human errors; they did things repeatedly, quickly, and predictably. If I'm going to accept human error, I'd like it to come from a human.

    • > The advantage of computers was that they didn't make human errors;

      Sure they do, computers repeatedly, quickly, and predictably do what they are programmed to do. Which includes any human errors in that programming.

      1 reply →

  • I think the problem is most secrets arn't stochastic; they're determinant. When the user types in the wrong password, it should be blocked. Using a probabilistic model suggests an attacker only now needs to be really close, but not correct.

    Sure, there's some math that says being really close and exact arn't a big deal; but then you're also saying your secrets don't need to be exact when decoding them and they absolutely do atm.

    Sure looks like a weird privacy veil that sorta might work for some things, like frosted glass, but think of a toilet stall with all frosted glass, are you still comfortable going to the bathroom in there?

    • I dunno what use case you're thinking this is for.

      The use case for this is that many enterprise customers want SaaS products to strip PII from ingested content, and there's no non-model way to do it.

      Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.

      1 reply →