Comment by yorwba
12 hours ago
Anthropic was already special-casing case-folding in their tokenizers before this recent change: https://transformer-circuits.pub/2025/attribution-graphs/met... "The tokenizer the model was trained with uses a special “Caps Lock” token" (⇪). Their visualizations for Claude 3.5 Haiku also show the Title Case token (↑).
This is similar to what the TokenMonster tokenizer does: https://github.com/alasdairforsythe/tokenmonster
No comments yet
Contribute on Hacker News ↗