Comment by Intox

3 years ago

Another great tool solving the exact problem we're willing to solve using an external service we can't use.

No company at a decent size (those who actually reach some complexity of documentation) will be okay with exfiltrating confidential information to an external service we have no deal or NDA with. Sure, OpenAI is easy to integrate, but it's also an absolute showstopper for a company.

We don't need state-of-the-art LLMs with 800k context, we need confidentiality.

I'm kinda confused by this. Every company already keeps their data in Google Docs, Notion, Slack, Confluence, Jira, or any number of other providers. When you sign up for one of these services, there's always a compliance step to make sure it's ok. OpenAI's TOS says they don't use API data for training. So what makes sending this data to OpenAI different than sending it to any of the above providers? This is an honest question. I don't understand the difference.

  • > Every company already keeps their data in Google Docs

    The TOS for the (paid) enterprise products such as Google workspace are totally different from the (free) consumer versions. For example Google can't use the data for AI training.

    • TOS of OpenAI API (which tools like this use) do not allow for model training on the data either. You might be confusing their API with ChatGPT, which has a different policy.

      5 replies →

  • For our part, we self-host Confluence and gitlab, have tons of internal documentation and web pages, are are prohibited from using external tools unless they can be hosted internally in a sandboxed manner. There's no way on the planet they would approve the use of connecting to an OpenAI API for trawling through internal documentation.

    • There are open source models that can deliver pretty well for chatbot over internal documentation. If you're interested, feel free to reach out to me.

  • Trust. OpenAI's ignored everyone's copyright and legal usage terms for the rest of their training data, what lawyer is going to trust them to follow their contractual terms?

  • Why would you send your data to the company that built its value by slurping up everyone's data without consent? It doesn't matter what they promise now, they have shown that they dont care about intellectual property, copyright or any of that. They literally cannot be trusted.

Two weeks ago I finished a project for a client who wanted a "talk to your documents" application, without using OpenAI or other 3rd party APIs, but by using open source models running on their own infrastructure.

If you're interested in something similar, send me an email.