← Back to context

Comment by com2kid

21 hours ago

Outside of code, the current RAG strategy is throw shit tons of unstructured text at it that has been found using vector search. Some companies are doing better, but the default rag pipelines are... kind of garbage.

For example, a chat bot doing recipe work should have a RAG DB that, by default, returns entire recipes. A vector DB is actually not the solution here, any number of traditional DBs (relational or even a document store) would work fine. Sure do a vector search across the recipe texts, but then fetch the entire recipe from someplace else. Current RAG solutions can do this, but the majority of RAG deployments I have seen don't bother, they just abuse large context windows.

Which looks like it works, except what you actually have in your context window is 15 different recipes all stitched together. Or if you put an entire recipe book into the context (which is perfectly doable now days!), you'll end up with the chatbot mixing up ingredients and proportions between recipes because you just voluntarily polluted its context with irrelevant info.

Large context windows allow for sloppy practices that end up making for worse results. Kind of like when we decided web servers needed 16 cores and gigs of RAM to run IBM Websphere back in the early 2000s, to serve up mostly static pages. The availability of massive servers taught bad habits (huge complicated XML deployment and configuration files, oodles of processes communicating with each other to serve a single page, etc).

Meanwhile in the modern world I've ran mission critical high throughput services for giant companies on a K8 cluster consisting of 3 machines each with .25 CPU and a couple hundred megs of RAM allocated.

Sometimes more is worse.

IMO: Context engineering is a fascinating topic because it starts approaching the metaphysical abstract idea of what LLMs even are.

If you believe that an LLM is a digital brain, then it follows that their limitation in capabilities today are a result of their limited characteristics (namely: coherent context windows). If we increase context windows (and intelligence), we can simply pack more data into the context, ask specific questions, and let the LLM figure it out.

However, if you have a more grounded belief that, at best, LLMs are just one part of a more heterogeneous digital brain: It follows that maybe actually their limitations are a result of how we're feeding it data. That we need to be smarter about context engineering, we need to do roundtrips with the LLM to narrow down what thbe context should be, it needs targeted context to maximize the quality of its output.

The second situation feels so much harder, but more likely. IMO: This fundamental schism is the single reason why ASI won't be achieved on any timeframe worth making a prediction about. LLMs are just one part of the puzzle.

  • Information in an LLM exists in two places:

    1. Embedded in the parameters

    2. Within the context window

    We all talk a lot about #2, but until we get a really good grip on #1, I think we as a field are going to hit a progress wall.

    The problem is we have not been able to separate out knowledge embedded in parameters with model capability, famously even if you don't want a model to write code, throwing a bunch of code at a model makes it a better model. (Also famously, even if someone never grows up to work with math day to day, learning math makes them better at all sorts of related logical thinking tasks.)

    Also there is plenty of research showing performance degrades as we stuff more and more into context. This is why even the best models have limits on tool call performance when naively throwing 15+ JSON schemas at it. (The technique to use RAG to determine which tool call schema to feed into the context window is super cool!)

    • I wonder if the next phase for leveraging LLMs against large sets of contextual, proprietary data (code repositories & knowledge bases come to mind) is going to look more like smaller models highly (and regularly) trained/fine-tuned against that proprietary data (that is maybe delegated tasks by the ultra-sized internet scale omni-brain models)

      If I'm asking Sonnet to agentically make this signin button green: does it really matter that it can also write haikus about the japanese landscape? That links back to your point: We don't have a grip, nearly at all, on how much this crosstalk between problem domains matters. Maybe it actually does matter? But certainly most of it doesn't. B

      We're so far from the endgame on these technologies. A part of me really feels like we're wasting too much effort and money on training ASI ultra internet scale models. I'm never going to pay $200+/mo for even a much smarter Claude; what I need is a system that knows my company's code like the back of its hand, knows my company's patterns, technologies, and even business (Jira boards, Google docs, etc), and extrapolates from that. That would be worth thousands a month; but what I'm describing isn't going to be solved by a 195 IQ gigabrain, and it also doesn't feel like we're going to get there with context engineering.

  • It's also a question of general vs specialized tools. If LLMs are being used in a limited capacity, such as retrieving recipes, then a limited environment where it only has the ability to retrieve complete recipes via RAG may be ideal in the literal sense of the word. There really is nothing better than the perfect specialized tool for a specialized job.

    • I did embedded work for years. A 100mhz CPU with 1 cycle SRAM latency and a bare metal OS can do as much as a 600MHZ CPU hitting DRAM running a preemptive OS.

      Specialized tools rock.