Comment by mattlondon

2 years ago

This is probably why Google or Meta will "win" over OpenAI in the end.

They have all the data, and no one else even comes close.

12 comments

mattlondon

Microsoft has plenty of data too. In Microsoft Teams, LinkedIn posts and messages, and Outlook emails.

AnthonyMouse 2 years ago
Nobody has explained how they could use that data without producing a model that would emit private information.
- abrichr 2 years ago
  
  Perhaps de-identification before training could be helpful here.
  Microsoft does seem active in this, e.g. https://microsoft.github.io/presidio/
  
  1 reply →
Havoc 2 years ago

And if they use any of it the entire worlds corporate lawyers will show up on their doorstep
Unlike googles victims (individuals) corporations can and do fight back when someone plays it fast & loose with their confidential coms
endofreach 2 years ago

I wouldn't worry about microsoft delivering quality in anyway.
hypoxia87 2 years ago

Plus every company's files in OneDrive and SharePoint.
ilovetux 2 years ago

Don't forget they have github as well.
dumbo-octopus 2 years ago

Microsoft 365 (nee Office 365) as well. And Dynamics 365. And GitHub. And OneDrive. And SharePoint. And Power Platform.
Honestly I think they might have more useful data than Google, given Bing knows more or less that same as GoogleBot. Meta doesn't come close, unless you want your LLM to be purely conversational.

Large companies like Google will fail at making the most successful LLMs because of internal cultural problems.

altdataseller 2 years ago
Go away ChatGPT
- sumeruchat 2 years ago
  
  So annoying. I feel sorry for people who actually read that paragraph.