← Back to context

Comment by jugoetz

7 hours ago

SMILES and SELFIES are molecular graph representations and aren't meant to solve the "parse this sum formula" problem.

SELFIES are for genAI. If you ask a VAE to generate SMILES, it will spit out some strings that are invalid - can't happen with SELFIES, that is the one application where they are robust.

It's still being argued if you really need SELFIES, or if SMILES autoencoders can be trained to only generate valid molecules, or if generating invalid molecules is useful (I'm in camp SELFIES, but I also want better ways to represent and learn on graphical chemical structures, ratehr than serialized strings).

  • can you guys explain what makes SELFIES robust? I'd only heard of SMILES until this thread, but I have been out of this space for 10 years.