Comment by btown

9 hours ago

This is Python; finding new ways to subscript into things directly is a graduate student’s favorite pastime!

In all seriousness I think that encoding-independent constant-time substring extraction has been meaningful in letting researchers outside the U.S. prototype, especially in NLP, without worrying about their abstractions around “a 5 character subslice” being more complicated than that. Memory is a tradeoff, but a reasonably predictable one.