Comment by dimal

3 years ago

I'm kinda confused by this. Every company already keeps their data in Google Docs, Notion, Slack, Confluence, Jira, or any number of other providers. When you sign up for one of these services, there's always a compliance step to make sure it's ok. OpenAI's TOS says they don't use API data for training. So what makes sending this data to OpenAI different than sending it to any of the above providers? This is an honest question. I don't understand the difference.

> Every company already keeps their data in Google Docs

The TOS for the (paid) enterprise products such as Google workspace are totally different from the (free) consumer versions. For example Google can't use the data for AI training.

  • TOS of OpenAI API (which tools like this use) do not allow for model training on the data either. You might be confusing their API with ChatGPT, which has a different policy.

    • The important point being, with Google, Notion, Slack, Confluence, etc. your company has an actual contract with the vendor, properly signed, with provisions about data handling that your company (and unlike you as an individual) can effectively enforce. There's an actual relationship created here, with benefits and losses flowing both ways.

      The Terms of Service? They're worth less than it costs to print them out.

      Case in point: right now, Microsoft is repackaging OpenAI models on their Azure platform and raking it in - the main value proposition here is literally just that it's "OpenAI, but with proper contract and an SLA". But companies happily pay up, because that's what makes the difference between "reliable and safe to use at work" vs. "violating internal and external data safety standards, and in some cases plain up illegal".

      1 reply →

    • IANAL, but I read the openai api TOS earlier today, and they keep data for up to 30 days for "review" and multiple people can get access to it. If I had confidential data I would not send it to them. Microsoft on the other hand seems to have a option where absolutely no data is stored for their openai service.

      2 replies →

For our part, we self-host Confluence and gitlab, have tons of internal documentation and web pages, are are prohibited from using external tools unless they can be hosted internally in a sandboxed manner. There's no way on the planet they would approve the use of connecting to an OpenAI API for trawling through internal documentation.

  • There are open source models that can deliver pretty well for chatbot over internal documentation. If you're interested, feel free to reach out to me.

Trust. OpenAI's ignored everyone's copyright and legal usage terms for the rest of their training data, what lawyer is going to trust them to follow their contractual terms?

Why would you send your data to the company that built its value by slurping up everyone's data without consent? It doesn't matter what they promise now, they have shown that they dont care about intellectual property, copyright or any of that. They literally cannot be trusted.

  • Isn't this what Google search does? Yet Google Docs, Gmail, etc are all OK?

    • It doesn't matter either way. What matters is that Google offers proper enterprise contracts. Contracts that are enforceable and transfer a lot of legal liability to the vendor. OpenAI, generally, does not offer such things.

      Google Search itself is a somewhat special case - it gets a free pass because of its utility and because you're unlikely to paste anything confidential into a search box. But there are many places where even Google Search is banned on the data security grounds.

      OpenAI offerings - ChatGPT, the playground, and the API - all very much encourage pasting large amounts of confidential information into them, which is why any organization with minimum of legal sense is banning or curtailing their use.