← Back to context Comment by kenjackson 3 months ago Why so much compute? Can you tie it to the problem? 3 comments kenjackson Reply typpilol 3 months ago Tokenizers are the reason LLMs are even possible to run at a decent speed on our best hardware.Removing the tokenizer would 1/4 the context and 4x the compute and memory, assuming an avg token length of 4.Also, you would probably need to 4x the parameters to have to learn understanding between individual characters as well as words and sentences etc.There's been a few studies on small models, even then those only show a tiny percentage gain over tokenized models.So essentially you would need 4x compute, 1/4 the context, and 4x the parameters to squeeze 2-4% more performance out of it.And that fails when you use more then 1/4 context. So realistically you need to support the same context, so you r compute goes up another 4x to 16x.That's why ashirviskas 3 months ago This has a ton of seemingly random assumptions, why can't we compress multiple latent space representations into one? Even in simple tokenizers token "and" has no right being the same size as "scientist". kenjackson 3 months ago Thanks. That helps a lot.
typpilol 3 months ago Tokenizers are the reason LLMs are even possible to run at a decent speed on our best hardware.Removing the tokenizer would 1/4 the context and 4x the compute and memory, assuming an avg token length of 4.Also, you would probably need to 4x the parameters to have to learn understanding between individual characters as well as words and sentences etc.There's been a few studies on small models, even then those only show a tiny percentage gain over tokenized models.So essentially you would need 4x compute, 1/4 the context, and 4x the parameters to squeeze 2-4% more performance out of it.And that fails when you use more then 1/4 context. So realistically you need to support the same context, so you r compute goes up another 4x to 16x.That's why ashirviskas 3 months ago This has a ton of seemingly random assumptions, why can't we compress multiple latent space representations into one? Even in simple tokenizers token "and" has no right being the same size as "scientist". kenjackson 3 months ago Thanks. That helps a lot.
ashirviskas 3 months ago This has a ton of seemingly random assumptions, why can't we compress multiple latent space representations into one? Even in simple tokenizers token "and" has no right being the same size as "scientist".
Tokenizers are the reason LLMs are even possible to run at a decent speed on our best hardware.
Removing the tokenizer would 1/4 the context and 4x the compute and memory, assuming an avg token length of 4.
Also, you would probably need to 4x the parameters to have to learn understanding between individual characters as well as words and sentences etc.
There's been a few studies on small models, even then those only show a tiny percentage gain over tokenized models.
So essentially you would need 4x compute, 1/4 the context, and 4x the parameters to squeeze 2-4% more performance out of it.
And that fails when you use more then 1/4 context. So realistically you need to support the same context, so you r compute goes up another 4x to 16x.
That's why
This has a ton of seemingly random assumptions, why can't we compress multiple latent space representations into one? Even in simple tokenizers token "and" has no right being the same size as "scientist".
Thanks. That helps a lot.