← Back to context

Comment by phkahler

1 year ago

>> These models are very small even by academic standards so any finding would not necessarily extend to current LLM scales.

Emphasis on not necessarily.

>> The main conclusion is that RNN class networks can be trained as efficiently as modern alternatives but the resulting performance is only competitive at small scale.

Shouldn't the conclusion be "the resulting competitive performance has only been confirmed at small scale"?

yes, that is clearer indeed. However S4 and Mamba class models have also performed well at small scale and started lagging with larger models and larger context sizes, or at particular tasks.