Comment by master_crab
1 month ago
Yup. And it’s already playing out that way. Anthropic, OpenAI, Gemini - technically not an upstart. All have hyperscalers backing and subsidizing their model training (AWS, Azure, GCP, respectively). It’s difficult to discern where the segmentation between compute and models are here.
> It’s difficult to discern where the segmentation between compute and models are here.
Startups can outcompete the Foundational Model companies by concentrating on creating a very domain specific model, and providing support and services that comes out of having expertise in that specific domain.
This is why OpenAI chose to co-invest in Cybersecurity startups with Menlo Ventures in 2022 instead of building their own dedicated cybersecurity vertical, because a partnership driven growth model nets the most profit with the least resources expended when trying to expand your TAM into a new and very competitive market like Cybersecurity.
This is the same reason why hyperscalers like Microsoft, Amazon, and Google themselves have ownership stakes in the foundational model companies like Anthropic, OpenAI, etc because at Hyperscalers size and revenue, Foundational Models are just a feature (an important feature, but a feature nontheless).
Foundational Models are a good first start, but are not 100% perfect in a number of fields and usecases. Ime, tooling built with these models are often used to cut down on headcount by 30-50% for the team using it to solve a specific problem. And this is why domain specific startups still thrive - sales, support, services, etc will still need to be tailored for buyers.
All of what you wrote is mostly true, except that "not 100% perfect in a number of fields and usecases" is quite an understatement. You mention the cybersecurity vertical. As a datapoint, I have put the simplest code security analysis question to ChatGPT (4o mini, for those who might say wait until the next one comes out). I made a novel vulnerable function, so that it would have never been seen before. I chose a very simple and easy vulnerability. Scores of security researchers in my vicinity spotted the vulnerability trivially and instantly. ChatGPT was more than useless, failing miserably to perform any meaningful analysis. The above is anecdotal data. Could be that a different tool would perform better. However, even if such models were harnessed by a startup to solve a specific problem, there is absolutely no way for present capabilities to yield a 30-50% HC reduction in this subdomain.
I agree. Foundational models suck at the high value security work that is needed.
That said, the easiest proof-of-value for foundation models in security today is automating the SOC function by auto-generating playbooks, stitching context from various signal sources, and being able to auto-summary an attack.
This reduces the need for hiring a junior SOC Analyst, and is a workflow that has already been adopted (or is in the process of being adopted) by plenty of F500s.
At the end of the day, foundational models cannot reason yet, and that kind of capability is still far away.
4o mini is a weak, old model. I’m pretty sure even regular o1 (non-pro) would be able to crack a simple vuln
1 reply →
by concentrating on creating a very domain specific model
I don’t disagree with this from an economics perspective (it’s expensive running an FM to handle domain specific queries). But the most accurate domain knowledge always tends to involve internal data. And then it becomes the issue raised above: a people problem involving internal knowledge and data management.
Incumbent hyperscalers and vendors like MS, Amazon, etc (and even third party data managers like snowflake) tend to have more leverage when it becomes this type of data problem.
>Startups can outcompete the Foundational Model companies by concentrating on creating a very domain specific model, and providing support and services that comes out of having expertise in that specific domain.
Well-put because the business is focused and to-the-point from the beginning.
For those applications where this gets you in the door to the domain, or gets you in sooner, this can be a competitive advantage. I think Lukas is pointing out the longer-term limitations of the approach though. I thought this would extend from 1980s electronics myself.
You could edit this however:
>Startups can [prosper] by concentrating on creating a very domain specific model, and providing support and services that comes out of having expertise in that specific domain.
And it may hold true anyway and you may have a lifetime of work ahead of you whether or not the more-generalized capabilities catch up or not. You don't always have to actually be competitive with capitalized corporations in the market if you are adding real value to begin with, and the sky can still be the limit.
>the most accurate domain knowledge always tends to involve internal data.
>Incumbent hyperscalers . . . tend to have more leverage when it becomes this type of data
That can help as a benchmark to gauge when a person or small team actually can occasionally outperform a billion-dollar corporation in some way or another.
I'm no Mr. Burns, but to this I have slowly said to myself "ex-cel-lent" similarly for decades.
It's good to watch AI approaches come and go and even better to be adaptable over time.
Interestingly, this is the exact opposite of the point the article makes — which is that over time, more general models and more compute are more capable, and by building a domain-specific model you just build a ceiling past which you can’t reach.
This is not the same as having unique access to domain-specific data, which becomes more valuable as you run it through more powerful domain-agnostic models. It sounds like this latter point is the one you say has value for startups to tackle
> This is not the same as having unique access to domain-specific data, which becomes more valuable as you run it through more powerful domain-agnostic models. It sounds like this latter point is the one you say has value for startups to tackle
Exactly!