Comment by Legend2440

20 days ago

>Needing an explicit training step is a weakness that makes CL hard to make work for many other approaches.

On the other hand, not having an explicit training step is a huge weakness of KNN.

Training-based methods scale better because the storage and runtime requirements are independent of dataset size. You can compress 100TB of training data down into a 70GB LLM.

A KNN on the same data would require keeping around the full 100TB, and it would be intractably slow.

1 comment

Legend2440

snovv_crash 19 days ago

Feature engineering is a thing, you don't need the full data source for KNN to do the search in. It is already used extensively in RAG type lookup systems, for example.