Comment by sireat
10 hours ago
So what would you use to classify whether a document is a critique or something else in 1M documents in a non-English language?
This is a real problem I am dealing with at a library project.
Each document is between 100 to 10k tokens.
Most top (read most expensive) LLMs available in OpenRouter work great, it is the cost (and speed) that is the issue.
If I could come up with something locally runnable that would be fantastic.
Presumably BERT based classifiers would work if I had one properly trained for the language.
I guess you've already seen https://huggingface.co/collections/answerdotai/modernbert-67... ?