Comment by 827a
1 day ago
Similar complaints are happening all over reddit with the Claude Code $200/mo plan and Cursor. The companies with deep VC funding have been subsidizing usage for a year now, but we're starting to see that bleed off.
I think the primary concern of this industry right now is how, relative to the current latest generation models, we simultaneously need intelligence to increase, cost to decrease, effective context windows to increase, and token bandwidths to increase. All four of these things are real bottlenecks to unlocking the "next level" of these tools for software engineering usage.
Google isn't going to make billions on solving advanced math exams.
Agreed, and big context windows are key to mass adoption in wider use cases beyond chatbots (random ex: in knowledge management apps, being able to parse the entire note library/section and hook it into global AI search), but those use cases are decidedly not areas where $200 per month subscriptions can work.
I'll hazard to say that cost and context windows are the two key metrics to bridge that chasm with acceptable results.... As for software engineering though, that cohort will be demanding on all front for the foreseeable future, especially because there's a bit of a competitive element. Nobody wants to be the vibecoder using sub-par tools compared to everyone else showing off their GitHub results and making sexy blog posts about it on HN.
Outside of code, the current RAG strategy is throw shit tons of unstructured text at it that has been found using vector search. Some companies are doing better, but the default rag pipelines are... kind of garbage.
For example, a chat bot doing recipe work should have a RAG DB that, by default, returns entire recipes. A vector DB is actually not the solution here, any number of traditional DBs (relational or even a document store) would work fine. Sure do a vector search across the recipe texts, but then fetch the entire recipe from someplace else. Current RAG solutions can do this, but the majority of RAG deployments I have seen don't bother, they just abuse large context windows.
Which looks like it works, except what you actually have in your context window is 15 different recipes all stitched together. Or if you put an entire recipe book into the context (which is perfectly doable now days!), you'll end up with the chatbot mixing up ingredients and proportions between recipes because you just voluntarily polluted its context with irrelevant info.
Large context windows allow for sloppy practices that end up making for worse results. Kind of like when we decided web servers needed 16 cores and gigs of RAM to run IBM Websphere back in the early 2000s, to serve up mostly static pages. The availability of massive servers taught bad habits (huge complicated XML deployment and configuration files, oodles of processes communicating with each other to serve a single page, etc).
Meanwhile in the modern world I've ran mission critical high throughput services for giant companies on a K8 cluster consisting of 3 machines each with .25 CPU and a couple hundred megs of RAM allocated.
Sometimes more is worse.
IMO: Context engineering is a fascinating topic because it starts approaching the metaphysical abstract idea of what LLMs even are.
If you believe that an LLM is a digital brain, then it follows that their limitation in capabilities today are a result of their limited characteristics (namely: coherent context windows). If we increase context windows (and intelligence), we can simply pack more data into the context, ask specific questions, and let the LLM figure it out.
However, if you have a more grounded belief that, at best, LLMs are just one part of a more heterogeneous digital brain: It follows that maybe actually their limitations are a result of how we're feeding it data. That we need to be smarter about context engineering, we need to do roundtrips with the LLM to narrow down what thbe context should be, it needs targeted context to maximize the quality of its output.
The second situation feels so much harder, but more likely. IMO: This fundamental schism is the single reason why ASI won't be achieved on any timeframe worth making a prediction about. LLMs are just one part of the puzzle.
4 replies →
Big, coherent context windows are key to almost all use-cases. The whole house of cards RAG implementations most platforms are using right now are pretty bad. You start asking around about how to implement RAG and you realize: No one knows, the architecture and outcomes at every company are pretty bad, the most common words you hear are "yeah it pretty much works ok i guess".
> Similar complaints are happening all over reddit with the Claude Code $200/mo
I would imagine 95% of people never get anywhere near to hitting their CC usage. The people who are getting rate-limited have ten windows open, are auto-accepting edits, and YOLO'ing any kind of coherent code quality in their codebase.