← Back to context

Comment by gessha

2 months ago

As a person who has trained a number of computer vision deep networks, I can tell you that we have some cool-looking visualizations on how lower layers work but no idea how later layers work. The intuition is built over training numerous networks and trying different hyperparameters, data shuffling, activations, etc. it’s absolutely brutal over here. If the theory was there, people like Karpathy who have great teacher vibes would’ve explained it for the mortal grad students or enthusiast tinkerers.

> The kind of progress being made leads me to believe there absolutely ARE people who absolutely know how the LLMs work and they're not just a bunch of monkeys randomly throwing things at GPUs and seeing what sticks

I say this less as an authoritative voice but more as an amused insider: Spend a week with some ML grad students and you will get a chuckle whenever somebody says we’re not some monkeys throwing things at GPUs.

It may be as simple as this:

https://youtube.com/shorts/7GrecDNcfMc

Many many layers of that. It’s not a profound mechanism. We can understand how that works, but we’re dumbfounded how such a small mechanism is responsible for all this stuff going on inside a brain.

I don’t think we don’t understand, it’s a level beyond that. We can’t fathom the implications, that it could be that simple, just scaled up.

  • > Many many layers of that. It’s not a profound mechanism

    Bad argument. Cavemen understood stone, but they could not build the aqueducts. Medieval people understood iron, water and fire but they could not make a steam engine

    Finally we understand protons, electrons, and neutrons and the forces that government them but it does not mean we understand everything they could mossibly make

    • "Cavemen understood stone"

      How far removed are you from a caveman is the better question. There would be quite some arrogance coming out of you to suggest the several million years gap is anything but an instant in the grand timeline. As in, you understood stone just yesterday ...

      The monkey that found the stone is the monkey that built the cathedral. It's only a delusion the second monkey creates to separate it from the first monkey (a feeling of superiority, with the only tangible asset being "a certain amount of notable time passed since point A and point B").

      "Finally we understand protons, electrons, and neutrons and the forces that government them but it does not mean we understand everything they could mossibly make"

      You and I agree. That those simple things can truly create infinite possibilities. That's all I was saying, we cannot fathom it (either because infinity is hard to fathom, or that it's origins are humble - just a few core elements, or both, or something else).

      Anyway, this can discussion can head into any direction.