← Back to context

Comment by ozr

2 years ago

The intuition about how the gzip method works goes like so:

If you compress `ABC`, it will be X bytes. If you then compress `ABCABC`, it will not take 2x bytes. The more similar the two strings that you concatenate, the less bytes it will take. `ABCABD` will take more than `ABCABC`, but less than `ABCXYZ`.

BERT is, by todays standards, a very small LLM, which we know has weaker performance than the billion-param scale models most of us are interacting with today.

> very small LLM

Heh. So does that make it a MLM (medium)?

I've always found it funny that we've settled on a term for a class of models that has a size claim... Especially given how fast things are evolving...