Comment by schoen
6 years ago
I learned a lot from this article and hope to revisit it.
I had one quarrel:
> Consider: what is a parser? Really, a parser is just a function that consumes less-structured input and produces more-structured output. By its very nature, a parser is a partial function—some values in the domain do not correspond to any value in the range—so all parsers must have some notion of failure.
I think it's reasonable to include some total functions as parsers as well, because some grammars -- structures that are recognized by the parser and that specify languages -- generate all strings as the corresponding language. In that case, there are no failure cases because every string is a valid input.
I came up with some more-trivial examples, but I think a less-trivial example is the split() function in many languages. Its Haskell type signature is String -> [String], and it is a total function. It handles the case where the input is a nonempty string with no delimiters (the output is a list containing a single token equal to the input), as well as the case where input is an empty string (the output is an empty list), and the case where the input consists only of delimiters (in many implementations, the output is a list containing n+1 empty strings).
Another trivial-ish case is where the grammar allows an arbitrary trailing string following the object of interest (similar to C's atoi). In that case, a parser will always return the empty object if there is no interesting or relevant data in the input. (For example, atoi returns 0 if the input string doesn't begin with an integer.) This could be viewed as a kind of error, but there may be applications in which this behavior is useful.
I think this is a reasonable point, and your split() example is a good one. I wasn’t sure while I was writing the blog post if I considered isomorphisms to be parsers, and I’m still not entirely sure if I do or not, but there’s an argument to be made that they are simply parsers where the set of possible failures is empty. I don’t have strong feelings about that distinction one way or the other, though.
Further nit-picking: even if there are no errors, I don't think it's an isomorphism unless the parser preserves all information? Usually parsers treat some input strings as equivalent (by stripping whitespace and comments), so there are fewer outputs than valid inputs.
I can't think of an easy way to make string splitting an isomorphism in both directions. You can serialize any list of strings to a comma-separated list, but only if you use escape sequences or encode the length of each item, and then there are some input strings that are errors.
I guess you could define a type as the set of all valid strings in the input language, but that's going out of your way to make sure parsing has to be done more than once.
Parsers that report no errors don't seem desirable anyway; error-checking is usually good.
Some other random feedback: the short phrase makes me think that I should parse instead of validating. Might I suggest: "Don't just validate: parse"?
The idea really is that parsing subsumes validation. If you’re parsing, you don’t have to validate, because validation is a natural side-effect of the parsing process. And indeed, I think that’s the very thesis of the blog post: parsing is validation, just without throwing away all the information you learned once you’ve finished!
4 replies →
Structure is orthogonal to partiality.
Partiality is what you get when you choose you only accept one branch of a disjunctive structure.
This seems needlessly pedantic - we could also notice that a total function is just a special case of a partial function, and so total functions are already included by the article.