Comment by imjonse

1 year ago

To their credit, the authors (Y. Bengio among them) end the paper with the question, not suggesting they know the answer. These models are very small even by academic standards so any finding would not necessarily extend to current LLM scales. The main conclusion is that RNN class networks can be trained as efficiently as modern alternatives but the resulting performance is only competitive at small scale.

>> These models are very small even by academic standards so any finding would not necessarily extend to current LLM scales.

Emphasis on not necessarily.

>> The main conclusion is that RNN class networks can be trained as efficiently as modern alternatives but the resulting performance is only competitive at small scale.

Shouldn't the conclusion be "the resulting competitive performance has only been confirmed at small scale"?

  • yes, that is clearer indeed. However S4 and Mamba class models have also performed well at small scale and started lagging with larger models and larger context sizes, or at particular tasks.