Comment by marcinzm
2 years ago
>gzip approach not better than dnn models but mostly competes and much cheaper to run
Does it? It looks to do worse than FastText in all benchmarks and kNN is not a cheap algorithm to run so it might actually be slower than FastText.
edit: It looks like FastText takes 5 seconds to train on the Yahoo Answers data set while the gzip approach took them 6 days. So definitely not faster.
I'm not familiar with most of these models in detail, but training time is generally less interesting than inference time to me. I don't care if it takes a month to train on $10k of gpu rentals if it can be deployed and run on a raspberry pi. I should definitely look into fasttext though.
As described in the paper, it didn't look like the gzip classifier trained at all. Inference involved reading the entire training set.
One could surely speed this up by preprocessing the training set and snapshotting the resulting gzip state, but that wouldn't affect the asymptotic complexity. In effect, the number of parameters is effectively equal to the size of the entire training set. (Of course, lots of fancy models scale roughly like this, too, so this isn't necessarily a loss.)
The gzip approach is much slower at inference time because you need to compute the gzip representation of the concatenated strings (query + target). Intuitively, this should be significantly more than a dot product of two embedding vectors.
3 replies →
FastText isn't a LLM, it's a token embedding model with a simple classifier on top.
Sure but it's existence means the statement is really "gzip approach not better than dnn models, and doesn't compete or be cheaper to run than previous models like FastText." That's not a very meaningful value statement for the approach (although why gzip is even half-decent might be a very interesting research question).