Comment by thesz
19 days ago
> like graph relationships
Once upon a time during me being language modeling researcher I built and finetuned a big (at the time - about 5 billions parameters) Sparse Non-Negative Matrix Language Model [1].
[1] https://aclanthology.org/Q16-1024/
As this model allows for mix-and-match of various contexts, one thing that I did is to have a word-sorted context. This effectively transforms position-based context into a word-set based context, where "you and me", "me and you" and "and me you" are the same.
This allowed for longer contexts and better prediction.
I've saved it to look at it in the future. I also remembered Kristina Tautanova's name (your editor). Looking up recent publications, she's done interesting work on analyzing pretraining mixtures.
https://aclanthology.org/2025.acl-long.1564/
Thanks to you both for two, interesting papers tonight. :)
I am not an author of SNMLM paper. ;)
I was using their model in my work.
I misunderstood what you said.
Well, in your work, whay benefit did you get from it? And do you think it would be beneficial today combined with modern techniques? Or obsoleted by other technqiue?
(I ask because I'm finding many old techniques are still good or could be mixed with deep learning.)
1 reply →