← Back to context

Comment by rspeele

19 hours ago

Object-oriented polymorphism (interfaces, inheritance) is for when you have a fixed set of methods to implement but an unbounded set of types that may want to implement them.

As a consumer, you cannot change the methods, but you can add a subtype. When you subtype an abstract class or an interface, the compiler does not let you proceed until you have implemented all the methods.

Discriminated unions are for the exact opposite situation, when you have a fixed set of subtypes, but unbounded set of methods to implement on them. As a consumer, you cannot add a subtype, but you can add a new method. When you write a new method, the compiler does not let you proceed until you have handled all the subtypes.

Good languages should support both!

The best example is abstract syntax trees, the data types that represent expressions and statements in a programming language. "Expression" breaks down into cases: integer literal, string literal, variable name, binary operations like add(expr1,expr2), unary operations like negate(expr), function call(functionName, exprs), etc.

Clearly all of these expression subtypes should belong to a base type `Expression`. But what methods do you put on `Expression`? If you're writing a compiler, you have to walk this syntax tree many times for very different purposes. First you might do a pass on it where you "de-sugar" syntax, then another pass where you type-check it and resolve names in the code, then another pass where you generate assembly code from it. Perhaps your compiler even supports different backends so you have a code-gen path for x86, another for ARM, etc. You'll likely want a pretty-printer so you can do automatic reformatting, maybe you want linting support, etc.

If you look at all those concerns and say that each subtype of `Expression` must implement methods for each one, then you end up with untenable code organization. Every expression subtype now has a huge stack of methods to implement all in one file, dealing with stuff from totally different layers of the compiler. It's a mess.

It's much cleaner to have the "shape" of the expression defined in one place without all that clutter, and then in each of those areas of the code you can write methods that consume expressions however they need, so each of those separate concerns lives in its own silo.

------------------------------------------------

Some real code (but it's F# not C#) to look at.

AST for my SQL dialect: https://github.com/fsprojects/Rezoom.SQL/blob/master/src/Rez... Typechecker code: https://github.com/fsprojects/Rezoom.SQL/blob/master/src/Rez... Backend code that outputs MS TSQL from it: https://github.com/fsprojects/Rezoom.SQL/blob/master/src/Rez...

------------------------------------------------

If you're an old hand at OO you may be familiar with its actual answer to this problem, the "Visitor" pattern. See System.Linq.Expressions.ExpressionVisitor. However, once you've used a language with good union and pattern matching support, this feels like a clunky hack. Basically the mirror image of a language without real object orientation imitating it by passing around closures and structs-of-closures.

------------------------------------------------

It doesn't just have to be compiler stuff. A business app data model can use this too. Instead of having:

    public class DbUser
    {
        public EmailAddress Email { get; set; }
        public PasswordHash? Password { get; set; } // null if they use SSO
        public SamlEntityProviderId? SamlProvider { get; set; } // null if they use password auth
    }

You could have:

    type UserAuth =
        | PasswordAuth of PasswordHash
        | SSOAuth of SamlIdentityProviderId

The implementation details of those different auth methods, the UI for them, etc. don't have to be part of the data model. We do have to model what "shapes" of data are acceptable, but "doing stuff" based on those shapes is another layer's problem.