← Back to context

Comment by kadushka

7 months ago

These models are not trained on character level input. Why would anyone expect them to perform well on character level puzzles?

18 comments

kadushka

Reply

brrrrrm 7 months ago

emergent behavior. These things are surprisingly good at generalizing

Jensson 7 months ago

They are trained on many billions of tokens of text dealing with character level input, they would be rather dumb if they couldn't learn it anyway.

Every human learns that, when you hear the sound "strawberry" you don't hear the double r there, yet you still know the answer.

brookst 7 months ago
These models operate on tokens, not characters. It’s true that training budgets could be spent on exhaustively enumerating how many of each letter are in every word in every language, but it’s just not useful enough to be worth it.
It’s more like asking a human for the Fourier components of how they pronounce “strawberry”. I mean the audio waves are right there, why don’t you know?
- yahoozoo 7 months ago
  
  Although a vast majority of tokens are 4+ characters, you’re seriously saying that each individual character of the English alphabet didn’t make the cut? What about 0-9?
  
  14 replies →