Comment by panta

5 months ago

How do you handle the privacy of the scanned documents?

4 comments

panta

With the docrouter.ai, it can be installed on prem. If using the SAAS version, users can collaborate in separate workspaces, modeled on how Databricks supports workspaces. Back end DB is Mongo, which keeps things simple.

One level of privacy is the workspace level separation in Mongo. But, if there is customer interest, other setups are possible. E.g. the way Databricks handles privacy is by actually giving each account its own back end services - and scoping workspaces within an account.

That is a good possible model.

kbyatnal 5 months ago

We work with fortune 500s in sensitive industries (healthcare, fintech, etc). Our policies are:

- data is never shared between customers

- data never gets used for training

- we also configure data retention policies to auto-purge after a time period

panta 5 months ago
But how to get these guarantees from the upstream vendors? Or do you run the LLMs on premises?
- Karrot_Kream 5 months ago
  
  If you're using LLM APIs there are SLAs from the vendors to make sure your inputs are not used as training data and other guarantees. Generally these endpoints cost more to use (the compliance fee essentially) but they solve the problem.