← Back to context

Comment by Smerity

1 year ago

Excited to see more people working on RNNs but wish their citations were better.

In 2016 my team from Salesforce Research published our work on the Quasi-Recurrent Neural Network[1] (QRNN). The QRNN variants we describe are near identical (minGRU) or highly similar (minLSTM) to the work here.

The QRNN was used, many years ago now, in the first version of Baidu's speech recognition system (Deep Voice [6]) and as part of Google's handwriting recognition system in Gboard[5] (2019).

Even if there are expressivity trade-offs when using parallelizable RNNs they've shown historically they can work well and are low resource and incredibly fast. Very few of the possibilities regarding distillation, hardware optimization, etc, have been explored.

Even if you need "exact" recall, various works have shown that even a single layer of attention with a parallelizable RNN can yield strong results. Distillation down to such a model is quite promising.

Other recent fast RNN variants such as the RWKV, S4, Mamba et al. include citations to QRNN (2016) and SRU (2017) for a richer history + better context.

The SRU work has also had additions in recent years (SRU++), doing well in speech recognition and LM tasks where they found similar speed benefits over Transformers.

I note this primarily as the more data points, especially when strongly relevant, the better positioned the research is. A number of the "new" findings from this paper have been previously explored - and do certainly show promise! This makes sure we're asking new questions with new insights (with all the benefit of additional research from ~8 years ago) versus missing the work from those earlier.

[1] QRNN paper: https://arxiv.org/abs/1611.01576

[2] SRU paper: https://arxiv.org/abs/1709.02755

[3]: SRU++ for speech recognition: https://arxiv.org/abs/2110.05571

[4]: SRU++ for language modeling: https://arxiv.org/abs/2102.12459

[5]: https://research.google/blog/rnn-based-handwriting-recogniti...

[6]: https://arxiv.org/abs/1702.07825