Comment by nl

17 hours ago

If you are going to go to the bother of fine tuning for trivial problems like subject classification then I think you'll find Scikit Learn with a SGDClassifier on 2-grams will do probably just as well and be under 1MB for the trained classifier.

You can train it in under a minute, and it will work perfectly well on embedded devices.

Small LLMs are good choices for text classification in two cases:

- If you next to provide in-context examples and classifier based on them.

- Your classification goes beyond simple subject-type classifiers. For example, multiple choice question answering is classification where small LLM will work but traditional ML methods won't/

9 comments

djsjajah 16 hours ago

Not with 800 examples. If you are going to consider an ngram model, I think you are better off getting a frontier llm to write you an absurd regex.

nl 12 hours ago
Hmm maybe. Turns out the author trained a logistic-regression classifier on the embeddings too, but didn't report the results:
https://github.com/thelgevold/fine-tuned-classifier/blob/mai...
- dev-experiments 1 hour ago
  
  Expanding on this experiment using logistic regression is an interesting continuation, detailed here: https://www.teachmecoolstuff.com/viewarticle/using-logistic-...
  In summary: Using logistic regression actually improves accuracy, but also performance during both runtime and during training.
IanCal 10 hours ago

I would also recommend the approach of using an llm to create the examples, and then train from there.
You can even get fancy and do things like active learning with the llm taking the role of the human annotator and sending in trial statements (and you can use a cheap one for larger gen and a more expensive one for the classification).
I’d be interested in seeing how well LLMs work with writing things like code for what snorkel AI used to have (there was open source code a while back that I assume is still around somewhere, you wrote code that was a low quality set of classifiers and it trained a model around those)

zubiaur 5 hours ago

A small transformer like BERT or variants is a better fit. It only takes a few examples, which can be generated synthetically using an LLM.

Trains quickly and classifies speedily on modern hardware.

Had a lot of fun doing stuff like this years ago, before LLMs were a thing.

brokensegue 14 hours ago

there are models between 2-grams and 600m param models that would be good options. i don't expect a 2-gram to do very well here. also i'm not sure why this model isn't a fine choice if it solves their problem

throwa356262 13 hours ago
What would you suggest instead?
- stephantul 10 hours ago
  
  A non-autoregressive transformer trained with a classification objective.
  
  1 reply →