← Back to context

Comment by Hendrikto

1 year ago

No, actually much fewer tokens. 256 tokens cover all bytes. See the ByT5 paper: https://arxiv.org/abs/2105.13626

2 comments

Hendrikto

Reply

taeric 1 year ago

More tokens to a sequence, though. And since it is learning sequences...

loa_in_ 1 year ago

Yeah, suddenly 16k tokens is just 16kb of ASCII instead of ~6kwords