Comment by HarHarVeryFunny

1 month ago

> But really, why is it so?

I don't think the bitter lesson is (or should be?) that you don't need to engineer biases in because with enough compute you could learn them instead ... The real lesson is that finding the right (minimal) set of biases is hard, and you should therefore always learn them instead if possible. Systems that learn are liable to outperform those that don't because by letting the data speak for itself, they are learning the right biases.

I think this is why historically the bitter lesson has been correct - because people have built too much into AI systems, got it wrong, and made them limited and brittle as a result.

The same thing is continuing to happen today with pre-trained LLMs (a fixed set of patterns!) and bespoke test-time reasoning approaches (of which at most one, likely none, are optimal). Of course it's understandable how we got here, even if a bit like the drunk looking for his lost car keys under the street lamp rather than in the dark spot he lost them. Continuous incremental learning (no training vs inference time distinction) is an unsolved problem, and the basis of intelligence/reasoning is not widely understood.

> finding the right (minimal) set of biases is hard

I'm not familiar with this concept of a minimal set of biases.

The way I see it, it's a series of loosely community-defined tresholds. If there's something like a theory that defines those biases in a formal way, I would very much like to read it.

  • I'm just using (inductive) "bias" to refer to assumptions about the data that are built into the model and the way it learns, such as the way a CNN is built to assume spatial locality of patterns.

    My point is that the more biases you build in, especially with a poorly understood goal like AGI, the more chance you have of either getting it wrong, or simply of over-constraining the model. In general less is more - you want minimal necessary set of biases to learn effectively, without adding others that it could better (providing greater generality) learn for itself.