Comment by miohtama
3 months ago
Shouldn't all caps normalised to tokens like low caps? There are no separate tokens for all caps and low caps in Llama, or at least not in the past.
3 months ago
Shouldn't all caps normalised to tokens like low caps? There are no separate tokens for all caps and low caps in Llama, or at least not in the past.
Looking at the tokenizer for the older Llama 2 model, the tokenizer has capital letters in it: https://huggingface.co/meta-llama/Llama-2-7b-hf