← Back to context

Comment by pllbnk

3 hours ago

I think it’s a mistake to think that we will be blindly going in this direction for many years and then suddenly collectively wake up and realize what have we done. It’s a great filter and a great opportunity.

If LLMs stop improving at the pace of the last few years (I believe they already are slowing down) then they will still manage to crank out billions lines of code which they themselves won’t be able to grep and reason through, leading to drop in quality and lost revenue for the companies that choose to go all-in with LLMs.

But let’s be realistic - modern LLMs are still a great and useful tool when used properly so they will stay. Our goal will be to keep them on track and reduce the negative impact of hallucinations.

As a result software industry will move away from large complex interconnected systems that have millions of features but only a few of them actively used, to small high quality targeted tools. Because their work will be easier to verify and to control the side effects.

> If LLMs stop improving at the pace of the last few years (I believe they already are slowing down)

Depending on how you measure "improvement" they already have or they never will :-/

Measuring capability of the model as a ratio of context length, you reach the limits at around 300k-400k tokens of context; after that you have diminishing returns. We passed this point.

Measuring capability purely by output, smarter harnesses in the future may unlock even more improvements in outputs; basically a twist on the "Sufficiently Smart Compiler" (https://wiki.c2.com/?SufficientlySmartCompiler=)

That's the two extremes but there's more on the spectrum in between.

  • 300k-400k isn’t the current limit if you create modules and/or organize the code reasonably.. for the same reason we do this for humans: it allows us to interact with a component without loading the internals into out context.

    you can also execute larger tasks than this using subagents to divide the work so each segment doesn’t exceed the usable context window. i regular execute tasks that require hundreds of subagents, for example.

    in practice the context window is effectively unlimited or at least exceptionally high — 100m+ tokens. it just requires you to structure the work so it can be done effectively — not so dissimilar to what you would do for a person

    • That makes it not a context window.

      How to organize code like you said, and how agents interact with it, to keep the actual context window small is the fundamental challenge.

      1 reply →

I wish I got to hallucinate at work, and just get a pat on the head for constantly doing the wrong thing.

  • Maybe I am unlucky but I had worked with too many developers who couldn't make a good decision if their life depended on it. LLMs at least know how to convince you of their decisions with strong arguments.

    • Mmm, I feel it’s more common for them to just blindly agree with whatever you say.

      Assistant: “I propose A”

      User: “Actually B is better”

      Assistant: “you’re absolutely right”

      User: “actually let’s go with C”

      Assistant: “Good choice, reasons

      User: “wait A is better”

      Assistant: “Great decision!”