You can't build a moat with AI

1 year ago (generatingconversation.substack.com)

48 comments

vsreekanti

> The real differentiation lies in your data you feed into your models.

It's more than data. Steven Jobs used to tell the founders of the Segway that their secret technologies would leak sooner or later, even if they set up their factories in the middle of a desert in Nevada. His advice to the founders was that they needed to build a product that users couldn't get away from even if all the competitors had the same technology (or so I remember). That could be a process, an ecosystem, market penetration with an amazing supply chain (think about the largest seller of straws in the world. The unit price of a single straw is close to 0, which is really hard to achieve), and etc.

To me, data will just be one key component of the moat but not all. The moat of AI is the same ol' entire ecosystem: an infrastructure that is so efficient that the company can keep driving down the unit price of hosting the AI models; a fabulous culture to enable the company to keep churning out improvements; an extensive data platform and the associated process and sources that keeps provisioning quality data, a number of killer applications...

altdataseller 1 year ago

“What that also means is that you don’t need to be an AI genius to succeed in building applications. With thoughtful software engineering and a focus on customer data, you’ll build a moat over time.”

This sounds encouraging at first glance, but even more demoralizing when you think about it. It doesn’t matter how clever or smart you are over your competitors. If you don’t have the data, you don’t stand a chance. And of course the incumbents have the data not you, the entrepreneur

killthebuddha 1 year ago
I don't see it that way. If you build a genuinely novel application then the critical data doesn't exist yet (IMO almost by definition). Sometimes it's easier for a startup to do this rather than for an incumbent who's trying to shoehorn the application into a preexisting framework (technical, operational, whatever).
- marcosdumay 1 year ago
  
  If the data doesn't exist, you can't train an (neural network) AI as they exist today.
  Whatever thing people build, will be based on data that exists, not data that will be created afterwards. And guess what, you don't have that data.
  
  2 replies →

htrp 1 year ago

> We firmly believe the moat for AI application is in the data and the data engineering today. At some point, the process of building custom LLMs might get so fast and easy that we’ll all return to building our own models. That simply isn’t the case today.

Customized small models will outperform larger general models for your specific use case.

marcosdumay 1 year ago

I did expect that too, but it isn't happening reliably.
But small customized models seem to perform close to as well as large general ones.
sumeruchat 1 year ago
No they wont. Any model you train now will be beat by GPT5 easily
- VHRanger 1 year ago
  
  Past performance is not a predictor of future performance.
  While it's possible the gap between GPT5 and 4 is as big as between 4 and 3, it's unlikely. The gap between 2 and 3 was much larger as the one between 3 and 4 (and similarly between 1 and 2).
  Also, it's not clear that GPT5 will do this in an *economical way* once the spigot of investor money stops.
- littlestymaar 1 year ago
  
  Not for most human language, or anything that requires business-specific context where what's publicly available lags behind the state of the art your business cares about.
  And of course, not if you care about token throughput more than fancy abilities. Or price for that matter.
  So for many if not most businesses needs GPT-4 isn't the best tool out there, and GPT-5 is the canonical example of a vaporware right now.
- crooked-v 1 year ago
  
  I think the real power of customized small models will be running things on local hardware, except that we're in an awkward phase where the local hardware isn't quite beefy enough to run anything really useful yet. Maybe Apple will do something interesting in that space at WWDC.
  
  7 replies →
jsemrau 1 year ago

Tell more more about the data moat.

BadHumans 1 year ago

If you can't build a moat with technology you build it politically which Altman is already trying to do.

navaed01 1 year ago

“ For reference, our team at RunLLM has spent roughly 70% of our engineering cycles in the last quarter on data engineering — everything from pre-processing data at ingestion time to implementing hybrid search to reranking results.”

This to me is why I am actually bullish on building ‘LLM wrappers’ that are sustainable businesses. While you might not be developing new technology, you can be adding value to a client who can use chat gpt and “talk to your docs” but not the surrounding work to make it frictionless for their workflow or company and actually build a business. It’s entirely possible that AI will get so good that it makes it easy to all the surrounding tasks too.

mattlondon 1 year ago

This is probably why Google or Meta will "win" over OpenAI in the end.

They have all the data, and no one else even comes close.

altdataseller 1 year ago
Microsoft has plenty of data too. In Microsoft Teams, LinkedIn posts and messages, and Outlook emails.
- AnthonyMouse 1 year ago
  
  Nobody has explained how they could use that data without producing a model that would emit private information.
  
  2 replies →
- Havoc 1 year ago
  
  And if they use any of it the entire worlds corporate lawyers will show up on their doorstep
  Unlike googles victims (individuals) corporations can and do fight back when someone plays it fast & loose with their confidential coms
- endofreach 1 year ago
  
  I wouldn't worry about microsoft delivering quality in anyway.
- hypoxia87 1 year ago
  
  Plus every company's files in OneDrive and SharePoint.
- ilovetux 1 year ago
  
  Don't forget they have github as well.
- dumbo-octopus 1 year ago
  
  Microsoft 365 (nee Office 365) as well. And Dynamics 365. And GitHub. And OneDrive. And SharePoint. And Power Platform.
  Honestly I think they might have more useful data than Google, given Bing knows more or less that same as GoogleBot. Meta doesn't come close, unless you want your LLM to be purely conversational.
greenavocado 1 year ago
Large companies like Google will fail at making the most successful LLMs because of internal cultural problems.
- altdataseller 1 year ago
  
  Go away ChatGPT
  
  1 reply →

minimaxir 1 year ago

...which is why OpenAI is focusing more on enterprise sales and unique value-propositions such as the GPT Store which can't trivially be imitated by competitors.

This post seems to misunderstand what a "moat" is in a business sense (and unfortunately a lot of AI hypesters on social media do as well). The fact that LLMs are becoming a commodity was the point of the original "OpenAI has no moat" memo by Google, which has proven to be accurate.

satisfice 1 year ago

There’s nothing you can do to make AI apps reliable. You can experiment all you like and play with prompts… in the end you will have a flakey product.

CatWChainsaw 1 year ago

Hallucinations are features not bugs and all that.

gravitronic 1 year ago

Before I clicked I expected the article to be about LLM's failures in structural engineering applications

ianbicking 1 year ago

"It might feel like your applications’ prompts or prompt templates are a good form of differentiation. After all, your top-notch engineering team has invested days into tuning them to have the right response characteristics, tone, and output style. Of course, giving your competitors your prompts would probably accelerate their progress, but any good engineering team will figure out the right changes quickly. The main reason is that the experimentation (with the right evaluation data!) is quick and easy — trying a new prompt template isn’t much harder than writing it out. All it really takes is a little bit of patience, some creativity, and extra Azure OpenAI credits."

And yet, over and over, I see products with output that clearly comes from simple and frankly lazy prompting. You can do a lot with prompting, but engineerings are not putting in the work! (If an engineer is even the right person... probably not, given any specific application of an LLM.)

Prompting also isn't so reductive that you just write an evaluation and then iterate on the prompt until you satisfy the evaluation. Prompting is a co-creative exercise between the LLM, the domain expert, the product, and the user. And sure "data" fits in there, as well as relationships, comprehensibility, workflows, etc etc... the AI component is just a small piece of any full application.

morpheos137 1 year ago

You can build a moat and fill it with bullshit scraped from Reddit. That's all AI is anyway. Sorry to burst anyone's hype bubble.

VeejayRampay 1 year ago

the best model is not GPT-4 it's April of 2024

cpursley 1 year ago
What's the best model now?
- hypoxia87 1 year ago
  
  Claude 3 Opus
  
  2 replies →