Comment by curioussquirrel

4 months ago

Yes, but it would hurt its contextual understanding and effectively reduce the context window several times.

Only in the current most popular architectures. Mamba and RWKV style LLMs may suffer a bit but don't get a reduced context in the same sense.

  • You're right. There was also an experiment in Meta which tokenized bytes directly and it didn't hurt performance much in very small models.