← Back to context

Comment by crystal_revenge

15 hours ago

> LLMs are not feasible when you have a dataset of 10 million items that you need to classify relatively fast and at a reasonable cost.

What? That's simply not true.

Current embedding models are incredibly fast and cheap and will, in the vast majority of NLP tasks, get you far better results than any local set of features you can develop yourself.

I've also done this at work numerous times, and have been working on various NLP tasks for over a decade now. For all future traditional NLP tasks the first pass is going to be to get fetch LLM embeddings and stick on a fairly simple classification model.

> One prompt? Fair. 10? Still ok. 100? You're pushing it. 10M - get help.

"Prompting" is not how you use LLMs for classification tasks. Sure you can build 0-shot classifiers for some tricky tasks, but if you're doing classification for documents today and you're not starting with an embedding model you're missing some easy gains.