Comment by djsjajah
18 hours ago
Not with 800 examples. If you are going to consider an ngram model, I think you are better off getting a frontier llm to write you an absurd regex.
18 hours ago
Not with 800 examples. If you are going to consider an ngram model, I think you are better off getting a frontier llm to write you an absurd regex.
Hmm maybe. Turns out the author trained a logistic-regression classifier on the embeddings too, but didn't report the results:
https://github.com/thelgevold/fine-tuned-classifier/blob/mai...
Expanding on this experiment using logistic regression is an interesting continuation, detailed here: https://www.teachmecoolstuff.com/viewarticle/using-logistic-...
In summary: Using logistic regression actually improves accuracy, but also performance during both runtime and during training.
I would also recommend the approach of using an llm to create the examples, and then train from there.
You can even get fancy and do things like active learning with the llm taking the role of the human annotator and sending in trial statements (and you can use a cheap one for larger gen and a more expensive one for the classification).
I’d be interested in seeing how well LLMs work with writing things like code for what snorkel AI used to have (there was open source code a while back that I assume is still around somewhere, you wrote code that was a low quality set of classifiers and it trained a model around those)