Comment by nickpsecurity

20 days ago

I've saved it to look at it in the future. I also remembered Kristina Tautanova's name (your editor). Looking up recent publications, she's done interesting work on analyzing pretraining mixtures.

https://aclanthology.org/2025.acl-long.1564/

Thanks to you both for two, interesting papers tonight. :)

3 comments

nickpsecurity

thesz 19 days ago

I am not an author of SNMLM paper. ;)

I was using their model in my work.

nickpsecurity 19 days ago
I misunderstood what you said.
Well, in your work, whay benefit did you get from it? And do you think it would be beneficial today combined with modern techniques? Or obsoleted by other technqiue?
(I ask because I'm finding many old techniques are still good or could be mixed with deep learning.)
- thesz 19 days ago
  
  At the time (2018), it had perplexity close to LSTM, while having more coefficients and much shorter (hours vs days) training time.
  I tried to apply SNMLM's ideas to the byte-level prediction modeling here: https://github.com/thesz/snmlm-per-byte
  It was not bad, but I had trouble scaling it to the 1B set. Mostly because I have not enough time.
  I do hold same mindset as yours, that many old techniques are misunderstood or underapplied. For example, decision trees, in my experiments, allow for bit-length-per-byte comparable to LSTM (lstm-compress or LSTM in nncp experiments): https://github.com/thesz/codeta