Comment by miohtama
8 days ago
Shouldn't all caps normalised to tokens like low caps? There are no separate tokens for all caps and low caps in Llama, or at least not in the past.
8 days ago
Shouldn't all caps normalised to tokens like low caps? There are no separate tokens for all caps and low caps in Llama, or at least not in the past.
Looking at the tokenizer for the older Llama 2 model, the tokenizer has capital letters in it: https://huggingface.co/meta-llama/Llama-2-7b-hf