Comment by munchler

3 days ago

> I’ve also heard that functional languages lend themselves especially well to parsing tasks, and I’ve never understood why, so it’s a good opportunity to learn.

This is mainly due to a technique called "parser combinators", which can be expressed very cleanly in functional languages. The idea is to start with a standard set of primitive general-purpose parsers and combine them into more complex parsers that are specific to the task at hand. If you have an interest in either functional programming or parsing in general, parser combinators are good to have in your toolkit.

It looks like Gleam has at least two parser combinator packages that you might be able to use:

    • party (https://hexdocs.pm/party/)
    • parser_gleam (https://hexdocs.pm/parser_gleam/)

10 comments

munchler

lpil 2 days ago

Parser combinators are very slow in languages without a sufficiently optimising compiler, I wouldn't recommend them here.

In Gleam pattern matching and the "splitter" package is the go-to for text parsing.

IshKebab 3 days ago

I don't think parser combinators are any more difficult in imperative languages like Python than functional ones? I found a few for Python and C++.

sfvisser 3 days ago
Don’t know about difficult, but at least less elegant. Lazy evaluation, type inference, abstractions like Functor/Applicative/Alternative/Monad make them so incredibly natural to work with in a language like Haskell. Sure, they exist in other languages (made a few myself) but it’s not the same.
- KPGv2 2 days ago
  
  Yes, I'm writing a parser in Unison (in the Haskell family) for ASN.1 at the moment. It's so clean to write parsers with parser combinators.
  For example Asn1Type can be of form Builtin, Referenced, or Constrained. So a sum type.
  parseType = Builtin <$> parseBuiltin <|> (Referenced <$> parseReferenced) <|> (Constrained <$> parseConstrained)
  Assuming you have the parsers for Builtin, Referenced, and Constrained, you're golden. (Haskell PCs look very similar, possibly even exactly the same minus different parenthesis for operator precedence reasons.
  Compare Parsy for Python, particularly the verbosity (this parses SELECT statements in SQL):
  select = seq( _select=SELECT + space, columns=column_expr.sep_by(padding + string(",") + padding, min=1), _from=space + FROM + space, table=table, where=(space >> WHERE >> space >> comparison).optional(), _end=padding + string(";"),
  ).combine_dict(Select)
  The same thing in a FP-style language would be something like
  eq "SELECT" *> Select <*> (sepBy columnExpr (eq ",")) <* (eq "FROM") <*> parseTable <*> optional (eq "WHERE") <* eq ";"
  which would feed into something like
  type Select = { cols: [ColumnExpr], table: Table, where: Optional Where}
munchler 3 days ago

The general notion of a “combinator” comes from functional programming, so parser combinators are native to FP. They can definitely be done in imperative languages also, but in my experience (C# specifically) are much more clunky. A dynamically typed language like Python might be suitable, because the interpreter is lenient, but I’m dubious about C++.
One thing you will definitely need in any language to produce a concise parser is the ability to define custom operators, like >>. and .>>. These are the “combinators”.
andai 2 days ago

All Turing-complete languages are technically equivalent in capability, but that doesn't mean the same task will cost the same effort in each language.
The closer a task is to the limit of your ability to handle complexity, the more you will benefit from tools that simplify the task.
The choice of programming language unfortunately what tools are possible to implement ergonomically (in terms of both implementation and interface).
ModernMech 2 days ago

The way you would implement them in Python or C++ is using their functional facilities. Or you'd write a library that emulated them.
The key feature is first class functions, so you can have combinators like "or()", "and()", "repeat()", "optional()", etc, which take a function pointer as a parameter, as well as the input character stream; and returns the matched characters from the input stream, as well as the rest of the stream. When all combinators have this function signature, you can compose them easily, and writing the parser is becomes a matter of transcribing the ebnf grammar.

asplake 3 days ago

Or thanks to algebraic data types and pattern matching? Even in a modest recursive descent parser, these can be really nice to use.

munchler 3 days ago

Sure, although those are great FP features for all kinds of programming, not just parsing.

phplovesong 3 days ago

Id say mostly because of recursion is usually better for tree like ds, and builtin tco and exhaustive pattern matching. This is why ocaml is usually the goto for building a new languge prototype.