Comment by alganet

1 month ago

There is a circular argument going on in this article. Basically:

> "Flexible is better. Why? Because specific has been consistently worse"

I mean, I don't deny the historical trend. But really, why is it so? It makes sense to follow the trend, but knowing more about the reason why would be cool.

Also, I feel that the human cognitive aspects of "engineering some shit" are being ignored.

People are engineering solutions not only to be efficient, but to get to specific vantage points in which they can see further. They do it so they can see what the "next gen flexible stuff" looks like before others.

Finally, it assumes the option to scale computation is always available and ignores the diminishing returns of trying to scale vanguard technology.

The scale requirements for AI stuff are getting silly real fast due to this unshakable belief in infinite scaling. To me, they're already too silly. Maybe we need to cool down, engineer some stuff and figure out where the comfortable treshold lies.

> But really, why is it so?

I don't think the bitter lesson is (or should be?) that you don't need to engineer biases in because with enough compute you could learn them instead ... The real lesson is that finding the right (minimal) set of biases is hard, and you should therefore always learn them instead if possible. Systems that learn are liable to outperform those that don't because by letting the data speak for itself, they are learning the right biases.

I think this is why historically the bitter lesson has been correct - because people have built too much into AI systems, got it wrong, and made them limited and brittle as a result.

The same thing is continuing to happen today with pre-trained LLMs (a fixed set of patterns!) and bespoke test-time reasoning approaches (of which at most one, likely none, are optimal). Of course it's understandable how we got here, even if a bit like the drunk looking for his lost car keys under the street lamp rather than in the dark spot he lost them. Continuous incremental learning (no training vs inference time distinction) is an unsolved problem, and the basis of intelligence/reasoning is not widely understood.

  • > finding the right (minimal) set of biases is hard

    I'm not familiar with this concept of a minimal set of biases.

    The way I see it, it's a series of loosely community-defined tresholds. If there's something like a theory that defines those biases in a formal way, I would very much like to read it.

    • I'm just using (inductive) "bias" to refer to assumptions about the data that are built into the model and the way it learns, such as the way a CNN is built to assume spatial locality of patterns.

      My point is that the more biases you build in, especially with a poorly understood goal like AGI, the more chance you have of either getting it wrong, or simply of over-constraining the model. In general less is more - you want minimal necessary set of biases to learn effectively, without adding others that it could better (providing greater generality) learn for itself.