Comment by mehrdadn

6 years ago

But how can this possibly be true in general? As an example, just imagine I want to check that a string represents a list of space-delimited integers, where each one is also less than 10 digits. It's far more trivial to verify that than to actually parse the integers. And by performing the verification pass separately before the parsing pass, I can reject invalid inputs early, leading to much faster rejections than if I parse them as I validate them. The only way I can see there being zero cost difference is if everything is implicitly lazy, such that at runtime the verification won't even happen until the parsing needs to be performed too. Right?

It's true that in the failure case, you can get faster reults by short-circuiting. If failure cases are a substantial portion of your runtime, then yes, doing a fast pre-pass can be more efficient.

I'd speculate that in the real world, 99% of cases have time dominated by success-cases. Exceptions would be things like DOS attacks.

  • No that's just wrong. It's not just failures. Imagine if I verified they were all zero then I wouldn't have to do a full parse. Or if I verify they have only a few digits then I would avoid bignum parsing. I think this proves what I'm saying - this simply isn't true in general.

    • It's not necessarily wrong, though what you say makes sense.

      Any code that you write, you can move into the parser module. In your example, you have a function that checks the strings are all zero, and you wouldn't call the parser if it returned true. But you can simply declare this code to be part of the parser and just not call the rest of the parser. The difference is in the API: in the one case, you return false and in the other you return a parse error.

      Now, you may say that, knowing this information, you're going to parse into ints instead of BigNums. But the parser can do this too.

      You might also say that I happen to know, due to some context, that all the numbers have 10 digits or fewer. And that therefore I can do better than the compiler. But if you make the context and the strings the input to the parser, then you bring it level again.

      Or you might say that my application has some context that gives it an edge but the parser is a general purpose library that does not understand the context. In that case, your application does indeed have an edge. But that applies to any general purpose library and is not a point about the merits of parsing versus validation.

      What's maybe more interesting is if you might only need some of the parsed results or the parsed results wouldn't fit into memory. But for that you just need an incremental or streaming compiler.

      This is actually pretty common in real world compilers. For example, javascript parsers typically don't lex javascript functions in their initial pass through the source.

      1 reply →