← Back to context

Comment by dheera

1 year ago

I mean, transformer-based LLMs are RNNs, just really really really big ones with very wide inputs that maintain large amounts of context.

No. An RNN has an arbitrarily-long path from old inputs to new outputs, even if in practice it can't exploit that path. Transformers have fixed-size input windows.

  • A chunk of the output still goes into the transformer input, so the arbitrarily-long path still exists, it just goes through a decoding/encoding step.

  • no, you can give as much context to a transformer as you want, you just run out of memory

    • An RNN doesn't run out of memory from that, so they are still fundamentally different.

      How do you encode arbitrarily long positions, anyway?

      1 reply →

  • You can't have a fixed state and have arbitrarily-long path from input. Well you can but then it's just meaningless because you fundamentally cannot keep stuffing information of arbitrary length into a fixed state. RNNs effectively have fixed-size input windows.

    • The path is arbitrarily long, not wide. It is possible for an RNN to be made that remembers the first word of the input, no longer how long the input is. This is not possible with a transformer, so we know they are fundamentally different.

      2 replies →