← Back to context

Comment by lelanthran

2 days ago

> I made 2 posts in this thread regarding why I think they have a moat. Was there anything ambiguous or that you disagreed with?

I'm afraid I don't see those posts; I see 2x posts from you asserting they have a moat, but not why you think they have a moat.

I distinguish between "They have a moat." and "This is why $FOO, $BAR and $BAZ forms a moat."

Maybe you think brand recognition is a moat, but that didn't work out for incumbents before (too many examples to list).

It was kind of buried in my second post:

> They have internal scale and scope economies as the breadth of synthetic data expands.

These frontier labs will have a hundred or a thousand teams of people+AI working in parallel generating synthetic data to solve different niches. A few teams solve computer use. A few teams solve math. A few teams solve various games. So the org is basically a big machine that mints data, and model research is only a small part of it. Scale then is the moat.

The second leg of the moat thesis is that open weights competition will die off soon because the cost to keep up with the scale will be too excessive.

The third leg of the moat thesis is that customers are happy to pay big margins for differences that appear small if the benchmark is the measuring stick.

If the paradigm was still scrape internet -> train model, I'd agree that there is no moat.

  • I disagree that the model is a moat; distillation of models is going to happen, and even without it all the current players have models that are virtually indistinguishable for the use-case.

    Model capbilities have converged over time, and I don't see this trend reversing. OpenAI owns only the model.

    The provider who does have a moat is Google - they own the entire vertical, from the hardware, to the training data, they have it all.

    OpenAI has to buy GPUs, Google makes them.

    OpenAI has to rent data centers. Google owns them.

    OpenAI has to scrape the web for all training data. Google's collection of user emails (not counting their Android data harvesting, ad data harvesting user-tracking, etc) alone gives them a ton of training data which will never be available to scrapers.

    Google has billions of signed-in users, OpenAI has to market to and attract users (800m user count last I checked, but also last I checked that growth was asymptotic and flattening out).

    Thats what a moat looks like. Better technology and/or results has never been, in my memory, a moat.

    • Good points about Google.

      I think where I don't agree is about the model. You're mostly correct right now, and your view is supported by how close everyone is.

      Where I am more optimistic about the 2-4 biggest labs (not just OpenAI) is what the next 2 years looks like.

      I expect this to happen:

      - Synthetic data goes from 30% of training data to 90-97%+ of training data.

      - Synthetic data becomes hugely varied, and the production of it is factory-like and parallelized.

      The moat here is the data factory, and the scale/scope economies behind it.

      Thoughts?

      1 reply →