Comment by chermi
6 hours ago
Thanks, that's very interesting! Naive question, but why couldn't you force a specific tokenization scheme on SMILES? Specifically, just one token per element? I understand SELFIES does more, but your example of Ba/Br made me wonder.
I asked the authors of the original SMILES paper and they didn't have a good answer. I wrote a parser for SMILES so I could tokenize that way but never followed up, and eventually SELFIES was announced.