Comment by noosphr

10 hours ago

This article answers the question in the second paragraph then completely ignores the answer for the rest of it.

>My understanding is that this represents 3-4 “generations” of different technology (propellers, turbojets, etc). Each technology went through normal iterative improvement, then, when it reached its fundamental limits, got replaced by a better technology. The last technology, ramjets, reached its limit at about 3500 km/h, and there wasn’t the economic/regulatory will to develop anything better, so the record stands.

You don't have one sigmoid, you have multiple each stacked on top of each other. Airplanes aren't just one technology they are multiple technologies that happen to do the same thing.

Each one is following a sigmoid perfectly. It only looks exponential(ish) because of unpredictable discoveries that let you switch to another sigmoid that has a higher maximum potential.

The same is true in AI. If you used the same architecture as GPT2 today you're in for a bad time training a new frontier model. It's only because we have dozens of breakthroughs that the capabilities of models have improved as much as they have.

That said exponential and sigmoids are the wrong model to use for growth. Growth is a differential equation. It has independent inputs, it has outputs and some of those outputs are dependent inputs again through causal chains of arbitrary complexity. What happens depends entirely on what the specific DE that governs the given technology is. We can easily have a chaotic system with completely random booms and busts which have no deep fundamental rhyme or reason. We currently call that the economy.

The book "Origins of Efficiency" by Brian Potter discusses this. Stacked sigmoids are a well-understood idea in innovation.

The idea that exponential growth will continue with stacked sigmoids is also not a given. An example is the nail. Nails used to be about half a percent of US GDP. That's a pretty big number! A series of innovations stacked on each other (each innovation having its own sigmoid) to reduce the cost of nails. Nails dropped in cost by over 90%.

But eventually nail manufacturing reached a floor. And since the mid-20th century, we haven't gotten much better at making nails. The cost of nails actually started increasing slightly. We ran out of new innovation sigmoids, so we got stuck on the last one.

So what you actually have to predict is whether there will continue to be new sigmoids, not whether the existing sigmoid will asymptote (we already know it will).

This is much more difficult to forecast, because new sigmoids (major new innovations) tend to be unpredictable events. Not only are the particulars difficult to forecast (if they were knowable, the innovation would have already happened), but whether there will be a major innovation or not is also hard to forecast, because they are distinct and separate from any existing sigmoid trend.

So we are left with the idea that all current innovations in AI will asymptote in their scaling as they reach the plateau of the sigmoid, but there may be new sigmoids that keep the overall trend up. Or there may not be. We don't know.

That's not very satisfying, so we'll get to keep reading articles like this one.

  • I don't disagree with you, but your example of nails and their cost reductions made me wonder whether we reached a meaningful limit in say, some fundamental material terms, or whether we just reached a limit in terms of return on investment.

    Return on investment can be too low because the investment required is really high, but it can also be too low because the returns are just limited. If prices had dropped 90%, surely nails became even more ubiquitous, but at that stage there's only so much more money to dig out of the cost reduction hole. It feels plausible that there may have been ideas about more digging that could be done, but the reward just wasn't there in the market, especially versus just selling what worked.

    I bring it up because the distinction in one specimen may speak to a larger trend: do new sigmoid developments tend to fail to materialize more often because of serious physical limits / lack of good ideas, or because of limitations to ROI? (Or, other things?)

    In the arena of AI, the ROI on more intelligence/unit-cost seems pretty high right now. So, it seems like the difficulty of applying any potential innovations would have to be staggering for none to be pursued. Or, there'd have to just not be any good ideas to try.

    Overall, I think there's ideas to try. So in my opinion, that shapes out to justify a bullish sentiment on sigmoids continuing to stack until the perceived potential gains from more intelligence/unit-cost somehow fall off.

    Like I said, I don't disagree, we really don't know. But I feel it's a good bet that there's more coming.

Yes, I was surprised he never discussed the idea that such exponentials are typically made of stacked sigmoids.

That said... if the exponential is made of stacked sigmoids, it's still an exponential on the whole! The fact that it's made of stacked sigmoids is relevant to the engineers making it, but not so relevant to the users or those otherwise affected by it.

  • Only so long as you can keep inventing the next sigmoid in the stack.

    • Right, and thence back to the point of TFA.

      Either you black-box the curve and assume that you will keep stacking sigmoids for about as long as you already have already seen.

      Or you white box it and make some actual technical argument about why the curves can’t keep stacking.

      There are plenty of plausible arguments here. Scott is not arguing that the exponential must go on forever.

      He’s making a meta-level point about the debate; you have to pick one of the above, and you can’t just argue that “now is the time the s-curves will stop stacking” without providing some justification.

    • Sure, but we have no prior reason to expect that the 'rate of discoveries' is going to drop off significantly in the next few years. Certainly not stop entirely.

      2 replies →

> If you used the same architecture as GPT2 today you're in for a bad time training a new frontier model. It's only because we have dozens of breakthroughs

What exactly are these dozens of breakthroughs? Most frontier models architectures today still look very much like GPT2 at their core. There were various of improvements like instructgpt, finetuning techniques, efficiency improvements with kv caches, faster attention, lora, better tokenizers, etc. Most of these are for making things run faster. The biggest differentiator has probably been data curation and post-training data and the ability to fit more into the model. But I think we had few breakthroughs that would fall into the category of different technologies.

Something that deeply frustrates me, as someone who did R&D on model architectures, is how similar the modern LLM model architectures are to GPT2.

(This is a bit disingenuous, as lots/most of work is spent on the scaling and training side of things.)