Comment by slashdave
1 year ago
Simplistic thinking. An RNN hidden parameter space of high dimension provides plenty of room for linear projections of token histories. I think people just do not realize just how huge R^N can be.
1 year ago
Simplistic thinking. An RNN hidden parameter space of high dimension provides plenty of room for linear projections of token histories. I think people just do not realize just how huge R^N can be.
> Simplistic thinking. An RNN hidden parameter space of high dimension provides plenty of room for linear projections of token histories. I think people just do not realize just how huge R^N can be.
16N bits as hard limit, but more realistically, about 2N bits or less of useful information probably.
You'd need to grow the network dimension in proportion to the maximum sequence length just to avoid the information theoretical limit.