Comment by minimaxir
2 years ago
Relatedly, I checked and OpenAI deleted all references to their ChatML spec from their GitHub repo.
This is what it said in an earlier commit: https://github.com/openai/openai-python/blob/2942bf4bb635b1e...
2 years ago
Relatedly, I checked and OpenAI deleted all references to their ChatML spec from their GitHub repo.
This is what it said in an earlier commit: https://github.com/openai/openai-python/blob/2942bf4bb635b1e...
Something I never understood about ChatML: were those "<|im_start|>" things reserved sequences of text that mapped to specific integer tokens, but were not things you could include in your own text that you submitted to their API (or if you did try they would be tokenized differently)?
ChatGPT presumably adds them as special tokens to the cl100k_base tokenizer, as they demo in the tiktoken documentation: https://github.com/openai/tiktoken#extending-tiktoken
In theory they could be added in normal input but it's possible OpenAI has safeguards against it.