Comment by IncreasePosts

11 hours ago

Wouldn't a llm that just tokenized by character be good at it?

Yes, but it would hurt its contextual understanding and effectively reduce the context window several times.

  • Only in the current most popular architectures. Mamba and RWKV style LLMs may suffer a bit but don't get a reduced context in the same sense.

I asked this in another thread and it would only be better with unlimited compute and memory.

Because without those, then the llm has to encode way more parameters and way smaller context windows.

In a theoretical world, it would be better, but might not be much better.