← Back to context

Comment by verdverm

4 days ago

It does not seem like they are doing inference time weight changes, to the tune of running backprop. It sounds more like they are applying a pre-trained vector to the model, and select that vector based on the input, in a two step process

That’s my general understanding as well, but it isn’t a large conceptual leap to go from real-time selection of pretrained “z-vectors” to real-time generation of the same. The larger conceptual breakthrough, with demonstration of its effectiveness, is the big success here.

  • While not a large conceptual leap, the real-time generation of "z-vectors" is not cheap in terms of compute or data requirements, the latter of which I see as the main issue. How are you going to generate the vector from a single real-time input?

    I still have yet to see anything that dissuades me from agreeing with Yann LeCun when he says Transformers are fundamentally limited. We won't get creativity, reasoning, or even move past hallucinations without a major breakthrough

  • The interesting thing here is that the human brain also seems to use pretrained ... things. For vision, use the visual subsystem. For hearing, use the auditory subsystem. For movement ... you get the point. Plus you can combine these pretrained ... things, so for example for complex movement, like balancing on a tightrope, multiple subsystems are used (try standing on one leg with your eyes closed).

    Z-vectors are of course nothing like the subsystems in your brain, but general the approach is certainly similar to how the brain works.

Sort of. According to the text they can use multiple z-vectors (sets of weights that select for parts of the system to be used to answer a specific question) simultaneously, using a "simple optimization algorithm" to determine the relative weight for each of these vectors.