Comment by jabbany
2 years ago
> The miracle of BERT / embeddings is _not_ having to share words
To be fair, the original task is specifically chosen where something like knn+compression has a chance of being good: i.e. out of domain + low resource.
Under these conditions the training inputs could be too sparse for a highly parameterized model to learn good embeddings from.
In traditional in domain + big data classification settings there's no chance that non-parametric methods like compression would beat a learned representation.
No comments yet
Contribute on Hacker News ↗