Comment by dfox

4 years ago

For me the moral of the story is do not use (whatever)scanf() for anything other than toy programs. In most cases implementing your own tokenizer (for both of these cases of reading numbers that involves str(c)spn() to get length of candidate token and then strtosomething()) is significantly easier than reasoning about what scanf() really does (even ignoring accidentally quadratic implementation details) and whether that can be adapted to your usecase.

Can you ELICSUndergraduate. Tokenizing is normally for if you're writing a compiler of DSL right?

  • Yes. But any data format you read, particularly any plaintext data format you read, is essentially interpreting or compiling a DSL. On a typical job, people are writing compilers much more often than they think!

  • It is a general term for the process of breaking a string into "tokens" which have a sort of meaning. Definitely a common task in compilers, but not limited to it.