Comment by IncreasePosts

4 months ago

Wouldn't a llm that just tokenized by character be good at it?

I asked this in another thread and it would only be better with unlimited compute and memory.

Because without those, then the llm has to encode way more parameters and way smaller context windows.

In a theoretical world, it would be better, but might not be much better.

Yes, but it would hurt its contextual understanding and effectively reduce the context window several times.

  • Only in the current most popular architectures. Mamba and RWKV style LLMs may suffer a bit but don't get a reduced context in the same sense.

    • You're right. There was also an experiment in Meta which tokenized bytes directly and it didn't hurt performance much in very small models.