Comment by dheera

1 year ago

I mean, transformer-based LLMs are RNNs, just really really really big ones with very wide inputs that maintain large amounts of context.

9 comments

dheera

immibis 1 year ago

No. An RNN has an arbitrarily-long path from old inputs to new outputs, even if in practice it can't exploit that path. Transformers have fixed-size input windows.

dheera 1 year ago

A chunk of the output still goes into the transformer input, so the arbitrarily-long path still exists, it just goes through a decoding/encoding step.
WithinReason 1 year ago
no, you can give as much context to a transformer as you want, you just run out of memory
- immibis 1 year ago
  
  An RNN doesn't run out of memory from that, so they are still fundamentally different.
  How do you encode arbitrarily long positions, anyway?
  
  1 reply →
famouswaffles 1 year ago
You can't have a fixed state and have arbitrarily-long path from input. Well you can but then it's just meaningless because you fundamentally cannot keep stuffing information of arbitrary length into a fixed state. RNNs effectively have fixed-size input windows.
- immibis 1 year ago
  
  The path is arbitrarily long, not wide. It is possible for an RNN to be made that remembers the first word of the input, no longer how long the input is. This is not possible with a transformer, so we know they are fundamentally different.
  
  2 replies →