Comment by CharlieDigital
1 month ago
There's only one core problem in AI worth solving for most startups building AI powered software: context.
No matter how good the AI gets, it can't answer about what it doesn't know. It can't perform a process for which it doesn't know the steps or the rules.
No LLM is going to know enough about some new drug in a pharma's pipeline, for example, because it doesn't know about the internal resources spread across multiple systems in an enterprise. (And if you've ever done a systems integration in any sufficiently large enterprise, you know that this is a "people problem" and usually not a technical problem).
I think the startups that succeed will understand that it all comes down to classic ETL: identify the source data, understand how to navigate systems integration, pre-process and organize the knowledge, train or fine-tune a model or have the right retrieval model to provide the context.
There's fundamentally no other way. AI is not magic; it can't know about trial ID 1354.006 except for what it was trained on and what it can search for. Even coding assistants like Cursor are really solving a problem of ETL/context and will always be. The code generation is the smaller part; getting it right requires providing the appropriate context.
This is why I strongly suspect that AI will not play out the way the Web did (upstarts unseat giants) and will instead play out like smartphones (giants entrench and balloon).
If all that matters is what you can put into context, then AI really isn't a product in most cases. The people selling models are actually just selling compute, so that space will be owned by the big clouds. The people selling applications are actually just packaging data, so that space will be owned by the people who already have big data in their segment: the big players in each industry. All competitors at this point know how important data is, and they're not going to sell it to a startup when they could package it up themselves. And most companies will prefer to just use features provided by the B2B companies they already trust, not trust a brand new company with all the same data.
I fully expect that almost all of the AI wins will take the form of features embedded in existing products that already have the data (like GitHub with Copilot), not brand new startups who have to try to convince companies to give them all their data for the first time.
Yup. And it’s already playing out that way. Anthropic, OpenAI, Gemini - technically not an upstart. All have hyperscalers backing and subsidizing their model training (AWS, Azure, GCP, respectively). It’s difficult to discern where the segmentation between compute and models are here.
> It’s difficult to discern where the segmentation between compute and models are here.
Startups can outcompete the Foundational Model companies by concentrating on creating a very domain specific model, and providing support and services that comes out of having expertise in that specific domain.
This is why OpenAI chose to co-invest in Cybersecurity startups with Menlo Ventures in 2022 instead of building their own dedicated cybersecurity vertical, because a partnership driven growth model nets the most profit with the least resources expended when trying to expand your TAM into a new and very competitive market like Cybersecurity.
This is the same reason why hyperscalers like Microsoft, Amazon, and Google themselves have ownership stakes in the foundational model companies like Anthropic, OpenAI, etc because at Hyperscalers size and revenue, Foundational Models are just a feature (an important feature, but a feature nontheless).
Foundational Models are a good first start, but are not 100% perfect in a number of fields and usecases. Ime, tooling built with these models are often used to cut down on headcount by 30-50% for the team using it to solve a specific problem. And this is why domain specific startups still thrive - sales, support, services, etc will still need to be tailored for buyers.
8 replies →
Yes, I agree.
I recently spoke to a doctor that wanted to do a startup one part of which is an AI agent that can provide consumers second opinions for medical questions. For this to be safe, it will require access to not only patient data, but possibly front line information from content origins like UpToDate because that content is a necessity to provide grounded answers for information that's not in the training set and not publicly available via search.
The obvious winner is UpToDate who owns that data and the pipeline for originating more content. If you want to build the best AI agent for medical analysis, you need to work with UpToDate.
Yes. I think of Microsoft and SharePoint, for example. Enterprises that are using SharePoint for document and content storage have already organized a subset of their information in a way that benefits Microsoft as concerns AI agents that are contextually aware of your internal data.
> will instead play out like smartphones (giants entrench and balloon).
Someone correct me if I'm wrong, but didn't smartphones go the "upstarts unseat giants" way? Apple wasn't a phone-maker, and became huge in the phone-market after their launch. Google also wasn't a phone-maker, yet took over the market slowly but surely with their Android purchase.
I barely see any Motorola, Blackberry, Nokia or Sony Ericsson phones anymore, yet those were the giants at one time. Now it's all iOS/Android, two "upstarts" initially.
> Now it's all iOS/Android, two "upstarts" initially.
They weren't upstarts, they were giants who moved into a new (but tightly related) space and pushed out other companies that were in spaces that at first seemed closely related but actually were more different than first appeared.
Android and iOS won because smartphones were actually mobile computers with a cellular chip, not phones with fancy software. Seen that way Apple was obviously not an upstart, they were a giant that grew even further.
Google is perhaps somewhat more surprising since they didn't do hardware at all before, but they did have Chrome, giving them a major in on the web platform side, and were also able to leverage their enormous search revenue. Neither resource is available to an upstart/startup.
4 replies →
> Someone correct me if I'm wrong, but didn't smartphones go the "upstarts unseat giants" way?
I think "upstarts" is being used uphthread to mean "startups" and "giants" is being used in a general, not market-specific, sense; that is, it isn't referring to entities that are mere new entrants in a particular market but still potentially quite large and established firms displacing incumbents in the particular market, but new, small-starting firms taking over a newly-opened market segment, beating out the large, established firms (from other markets) that are also trying to compete in it.
The people selling models are actually just selling compute
Yes, fully agreed. Anything AI is discovering in your dataset could have been found by humans, and it could have been done by a more efficient program. But that would require humans to carefully study it and write the program. AI lets you skip the novel analysis of the data and writing custom programs by using a generalizable program that solves those steps for you by expending far more compute.
I see it as, AI could remove the most basic obstacle preventing us from applying compute to vast swathes of problems- and that’s the need to write a unique program for the problem at hand.
> All competitors at this point know how important data is, and they're not going to sell it to a startup when they could package it up themselves.
Except they won't package it themselves because they are inept and inert. They still won't sell it to startups though.
I think you're downplaying how well Cursor is doing "code generation" relative to other products.
Cursor can do at least the following "actions":
* code generation
* file creation / deletion
* run terminal commands
* answer questions about a code base
I totally agree with you on ETL (it's a huge part of our product https://www.definite.app/), but the actions an agent takes are just as tricky to get right.
Before I give Cursor, I often doubt it's going to be able to pull it off and I constantly impressed by how deep it can go to complete a complex task.
This really puzzles me. I tried Cursor and was completely underwhelmed. The answers it gave (about a 1.5M loc messy Spring codebase) were surface-level and unhelpful to anyone but a Java novice. I get vastly better work out of my intern.
To add insult to injury, the IntelliJ plugin threw spurious errors. I ended up uninstalling it and marking my calendar to try again in 6 months.
Yet some people say Cursor is great. Is it something about my project? I can't imagine how it deals with a codebase that is many millions of tokens. Or is it something about me? I'm asking hard questions because I don't need to ask the easy ones.
What are people who think Cursor is great doing differently?
My tinfoil hat theory is that Cursor deploys a lot of “guerilla marketing” with influencers on Twitter/LinkedIn etc. When I tried it, the product was not good (maybe on par with Copilot) but you have people on social media swearing by it. Maybe it just works well for specific types of web development, but I came away thoroughly unimpressed and suspicious that some of the “word of mouth” stuff on them is actually funded by them.
2 replies →
This is a great question and easy to answer with the context you provided.
I don't think your poor experience is because of you, it's because of your codebase. Cursor works worse (in my experience) on larger codebases and seems particularly good at JS (e.g. React, node, etc.).
Cursor excels at things like small NextJS apps. It will easily work across multiple files and complete tasks that would take me ~30 minutes in 30 seconds.
Trying again in 6 months is a good move. As models get larger context windows and Cursor improves (e.g. better RAG) you should have a better experience.
2 replies →
Its for novices and youtube AI hucksters. Its the coding equivalent of vibrating belts for weight loss.
1 reply →
So isn’t cursor just a tool for Claude or ChatGpt to use? Another example would be a flight booking engine. So why can’t an AI just talk direct to an IDE? This is hard as the process has changed, due to the human needing to be in the middle.
So Isn’t AI useless without the tools to manipulate?
I’m very “bullish” on AI in general but find cursor incredibly underwhelming because there is little value add compared to basically any other AI coding tool that goes beyond autocomplete. Cursor emphatically does not understand large codebases and smaller (few file codebases) can just be pasted into a chat context in the worst case.
Is it really that different to Claude with tools via MCP, or my own terminal-based gptme? (https://github.com/ErikBjare/gptme)
I thought it's basically a subset of Aider[0] bolted into a VS Code fork, and I remain confused as to why we're talking about it so much now, when we didn't about Aider before. Some kind of startup-friendly bias? I for one would prefer OSS to succeed in this space.
--
[0] - https://aider.chat/
11 replies →
I agree with you at this time, but there are a couple things I think will change this:
1. Agentic search can allow the model to identify what context is needed and retrieve the needed information (internally or externally through APIs or search)
2. I received an offer from OpenAI to give me free credits if I shared my API data with it, in other words, it is paying for industry specific data as they are probably fine tuning niche models.
There could be some exceptions to UI/UX going down specific verticals but eventually these fine tuning sector specific instances value will erode over time but this will likely occupy a niche since enterprise wants maximum configuration and more out of box solutions are oriented around SMEs.
It comes down to moats. Does OpenAI have a moat? It's leading the pack, but the competitors always seem to be catching up to it. We don't see network effects with it yet like with social networks, unless OpenAI introduces household robots for everyone or something, builds a leading marketshare in that segment, and the rich data from these household bots is enough training data that one can't replicate with a smaller robot fleet.
And AI is too fundamental of a technology that a "loss leader biggest wallet wins" strategy, used by the likes of Uber, will work.
API access can be restricted. Big part of why Twitter got authwalled was so that AI models can't train from it. Stack overflow added a no AI models clause to their free data dump releases (supposed to be CC licensed), they want to be paid if you use their data for AI models.
I wasn't referring to OAI, but rather:
1. Existing legacy players with massive data lock-ins like ERP providers and Google/Microsoft.
2. Massive consolidation within AI platforms rather than massive fragmentation if these legacy players do get disrupted or opportunities that do pop up.
In other words - the usual suspects will continue to win because they have the data and lock in. Any marginal value in having a specialized model, agent workflow, or special training data, ect. will not be significant enough to switch to a niche app.
It is indeed unfortunate and niches will definitely exist. What I am referring to is primarily in enterprise.
I don't think OpenAI have a moat in the traditional sense. Other players offer the exact same API so OpenAI can only win with permanent technical leadership. They may indeed be able to attain that but this is no Coca-Cola.
All you've proposed is moving the context problem somewhere else. You still need to build the search index. It's still a problem of building and providing context.
I disagree, these search indexes already exist, they just need to be navigated much how Cursor uses agentic search to navigate your codebase or you call Perplexity to get documentation. If the knowledge exists outside of your mind it can be searched agentically.
what do you think about these guys: https://exa.ai/
1 reply →
To your first point, the LLM still can’t know what it doesn’t know.
Just like you can’t google for a movie if you don’t know the genre, any scenes, or any actors in it, and AI can’t build its own context if it didn’t have good enough context already.
IMO that’s the point most agent frameworks miss. Piling on more LLM calls doesn’t fix the fundamental limitations.
TL;DR an LLM can’t magically make good context for itself.
I think you’re spot on with your second point. The big differentiators for big AI models will be data that’s not easy to google for and/or proprietary data.
Lucky they got all their data before people started caring.
> Just like you can’t google for a movie if you don’t know the genre, any scenes, or any actors in it,
ChatGPT was able to answer "What was the video game with cards where you play against a bear guy, a magic guy and a set of robots?" (it's Inscryption). This is one area where LLMs work.
4 replies →
It’s not even just the lack of access to the data, so much hidden information to make decisions is not documented at all. It’s intuition, learned from doing something in a specific context for a long time and only a fraction of that context is accessible.
This is where Microsoft has the advantage, all those Teams calls can provide context.
Yes, this is definitely a big problem.
Anyone that's done any amount of systems integration in enterprises knows this.
Exactly. Sure, as soon as more humans are replaced by agents who leave the full trace in the logs this fades away but this will take a long time. It will take many tiny steps in this direction.
> No matter how good the AI gets, it can't answer about what it doesn't know. It can't perform a process for which it doesn't know the steps or the rules
This is exactly the motivation behind https://github.com/OpenAdaptAI/OpenAdapt: so that users can demonstrate their desktop workflows to AI models step by step (without worrying about their data being used by a corporation).
Context is important but it takes about two weeks to build a context collection bot and integrate it into slack. The hard part is not technical, AIs can rapidly build a company specific and continually updated knowledge base, it's political. Getting a drug company to let you tap slack and email and docs etc is dauntingly difficult.
Difficult to impossible. Their vendors are already working on AI features, so why would they risk adding a new vendor when a vendor they've already approved will have substantially the same capabilities soon?
because a vendor just using AI tools will not achieve the same capabilities as a vendor that either is OpenAI or is backed by OpenAI will achieve soon
2 replies →
This problem will be eaten by OpenAI et al. the same way the careful prompting strategies used in 2022/2023 were eaten. In a few years we will have context lengths of 10M+ or online fine tuning, combined with agents that can proactively call APIs and navigate your desktop environment.
Providing all context will be little more than copying and pasting everything, or just letting the agent do its thing.
Super careful or complicated setups to filter and manage context probably won't be needed.
Context requires quadratic VRAM. It is why OpenAI hasn't even supported 200k context length yet for its 4o model.
Is there a trick that bypasses this scaling constraint while strictly preserving the attention quality? I suspect that most such tricks lead to performance loss while deep in the context.
I wouldn't bet against this. Whether it's Ring attention, Mamba layers or online fine tuning, I assume this technical challenge will get conquered sooner rather than later. Gemini are getting good results on needle in a haystack with 1M context length.
I suspect the sustainable value will be in providing context that isn't easily accessible as a copy and paste from your hard drive. Whatever that looks like.
Even subpar attention quality is typically better than human memory - we can imagine models that do some sort of triaging from shorter high-quality attention context and extremely long linear (or something else) context.
> Context requires quadratic VRAM
Even if this is not solved, there is so much economic benefit, tens of TBs of VRAM will become feasible.
Even if your context is a trillion tokens in length, the problem of creating that context still exists. It's still ETL and systems integration.
The model can take actions on the computer - give it access to the company wiki and slack and it can create its own context.
Yall really are just assuming this technology will stay still and not extrapolating from trends. A model that can get 25% on frontiermath is probably soon going to be able to navigate your company slack, that is not a more difficult problem than expert-level math proof development.
To bake a cake from scratch, you must first recreate the universe
I agree but do see 1 realistic solution to solve the problem you describe. Every product on the market is independently integrating a LLM right now that has access to their product’s silo of information. I can imagine a future where a corporate employee interacts with 1 central LLM that in turn understands the domain of expertise of all the other system-specific LLMs. Given that knowledge, the central one can orchestrate prompting and processing responses from the others.
We been using this pattern forever with traditional APIs but the huge hurdle is that the information in any system you integrate with is often both complex and messy. LLMs handle the hard work of handling ambiguity and variations.
I agree that context is one core focus, but I really don't agree that it's the only thing a startup can focus on.
Context aside, you have the generation aspect of it, which can be very important (models trained to output good SQL, or good legal contracts, etc). You have the UI, which is possibly the most important element of a good AI product (think the difference between an IDE and Copilot - very very different UX/UI for the same underlying model).
Context is incredibly important, and I agree that people are downplaying some aspects of ETL here (though this isn't standard ETL in some cases). But it's not even close to being everything.
Startups can still win against big players by building better products faster (with AI), collecting more / better data to feed AI, and then feeding that into better AI automation for customers. Big players won't automatically win, but more data is a moat that gives them room to mess up for a long time and still pull out ahead. Even then, big companies already compete against one another and swallowing a small AI startup can help them and therefore starting one can also make sense.
There are not really any startups in the position to feed AI the great data they have.
I found that fine-tuning and RAG can be replaced with tool calling for some specialized domains, e.g. real-time data. Even things like user's location can be tool called, so context can be obtained reliably. I also note that GPT-4o and better are smart enough to chain together different functions you give it, but not reliably. System prompting helps some, but the non-determinism of AI today is both awesome and a cure.
Tool calling is just systems integration with a different name. The job of the tool is still to provide context from some other system.
> I found that fine-tuning and RAG can be replaced with tool calling for some specialized domains
RAG is just a single-purpose instance of the more general process of tool calling, so, that's not surprising.
All of these comments are premised on this technology staying still. A model with memory and the ability to navigate the computer (we are already basically halfway there) would easily eliminate the problems you describe.
HN, i find, also has a tendency to fall prey to the bitter lesson.
There is a second, related problem: continuous learning. AI models won’t go anywhere as long as their state resets on each new session, and they revert to being like the new intern on their first day.
I somewhat agree. The agent will be able to find the information autonomously. But some data will be proprietary and out of reach for the agent.
Startups should really try to get such a moat. Chapter 2 will cover this.
> There's only one core problem in AI worth solving for most startups building AI powered software: context.
Is this another way of saying "content is king"?
AI code copilot like cursor provide a immersive context than most of other AI products.
And how does that differ from any person without that information?
It doesn't.
And that's why the teams that really want to unlock AI will understand that the core problem is really systems integration and ETL; the AI needs to be aware of the entire corpus of relevant information through some mechanism (tool use, search, RAG, graph RAG, etc.) and the startups that win are the ones that are going to do that well.
You can't solve this problem with more compute nor better models.
I've said it elsewhere in this discussion, but the LLM is just a magical oven that's still reliant on good ingredients being prepped and put into the oven before hitting the "bake" button if you want amazing dishes to pop out. If you just want Stouffer's Mac & Cheese, it's already good enough for that.
Wouldn't this just be foundational model + RAG in the limit?
RAG is the action of retrieval to augment generation. Retrieval of what? From where?
The process that feeds RAG is all about how you extract, transform, and load source data into the RAG database. Good RAG is the output of good ETL.
Yeah seems like context is the AI version of cache invalidation, in the sense of the joke that "there's only 2 hard problems in computer science, cache invalidation and naming things". It all boils down to that (that, and naming things)
And off-by-one errors :)
And an almost fanatical devotion to the Pope
Also, there's only one hard problem in software engineering: people.
Seems to apply to AI as well.
I agree - busy building to solve that :)