Comment by huac

3 years ago

no, because the compute intensity scales with the number of classes which you wish to classify to. if you have n classes, you need to do n gzip compressions at inference time. in the embedding world, you only call the embedding model once on insert, and only need to dot product at inference time.

the same logic extends to using a self-hosted embedding model, which tend to be as good as Ada on most benchmarks, and yes, can be finetuned over your private data.