← Back to context

Comment by stared

4 days ago

Technically speaking, cross encoders are LLMs - they use the last layer to predict similarity (a single number) rather than the probability of the next token. They are faster than generative models only if they are simpler - otherwise, there is no performance gain (the last layer is negligible). In any case, even the simplest cross-encoders are more computationally intensive than those using a dot product from pre-computed vectors.

That said, for many applications, we may be perfectly fine with some version of a fine-tuned BERT-like model rather than using the newest AGI-like SoTA just to compare if two products are vaguely similar, and it is worth putting the other one in suggestions.

This is true, and I’ve done quite a bit with static embeddings. You can check out my wordllama project if that’s interesting to you.

https://github.com/dleemiller/WordLlama

There’s also model2vec doing some cool things as well in that area. So it’s cool to see recent progress in 2024/5 on simple static embedding models.

On the computational performance note, the performance of cross encoder I trained using ModernBERT base is on par with the roberta large model, while being about 7-8x faster. Still way more complex than static, but on benchmark datasets, much more capable too.