Comment by cubefox

20 hours ago

Most people don't know that there are fundamentally two different kinds of "union" types: "tagged unions" and "untagged unions". Now .NET is introducing tagged unions, but unfortunately they stick to the popular tradition of calling them just "unions", which greatly adds to the confusion.

To clarify, these tagged unions are fundamentally different from the untagged unions that can be found in languages like Typescript or Scala 3.

Tagged unions (also called "discriminated unions" or "sum types") are algebraic data types. They act as wrappers, via special constructors, for two (or more) disjoint types and values. So a tagged union acts like a bag that has two (or more) labelled ("tagged") slots, where each slot has exactly one type and exactly one of these slot can take a value.

Untagged unions are set theoretic (rather than algebraic) data types. They don't require wrapping via a special constructor. They behave like the logical OR. They are not disjoint slots of a separate construct. A variable or function x with the type "Foo | Bar" can be of type Foo, or Bar, or both. To access a Foo method on x, one has to first perform a typecheck for Foo, otherwise the compiler will refuse the method call (since x might only have the type Bar which would produce an exception). If a variable is of type A, it is also of type A|B ("A or B"). There are also intersection types (A&B) which indicate that something has both types rather than at least one, and complement/negation types (~A indicates that something is of any type except A). Though the latter are not implemented in any major language so far.

13 comments

cubefox

jaen 11 hours ago

These are not what are commonly called tagged unions. It's actually closer to an untagged union.

The C# union does not store any discriminator. Just look at the implementation - it's a single `object?` field.

The discriminative part is handled by run-time type information, which is stored in the object itself, not the union - which is why the C# built-in implementation requires boxing.

Also, you can use the same class/record type ("discriminator") in several different unions - again, a feature which ADTs/sum-types/tagged unions in most functional languages do not have.

You can even store one single object (ie. identical by reference equality) in several different union values at the same time, theoretically, which in combination with mutability is... uhh, certainly not common functional/mathematical semantics.

troad 18 hours ago

Underlyingly, all unions are the same concept, exposed by C's `union` keyword, namely overlapping memory (as opposed to `struct`s, which are sequential memory). You can add a discriminator tag to your unions, your call. If it is impossible for there to be ambiguity (e.g. you track the state somewhere else, and you're dealing with a large array), it may be redundant / memory inefficient to tag every item.

Having these tags be a language construct is just a DX feature on top of unions. A very handy one, but it doesn't make tagged and untagged unions spring from different theoretical universes. I enjoy the ADT / set-theoretic debate as much as the next PL nerd, but theory ought to conform to reality, not vice versa.

cubefox 16 hours ago
That's like saying Haskell is really an imperative language just because it's written in C. But it doesn't matter how it is implemented, it matters how it behaves on the surface.
- troad 16 hours ago
  
  > That's like saying Haskell is really an imperative language just because it's written in C
  I honestly don't even remotely understand how you got to that from what I wrote.
  Unions already have a definition. You cannot go around claiming that technically unions are not unions (and that "most people don't know" this), because theory X or Y overloads the term union to refer to some unrelated concept. That's fine and well and good, within that theory, but it does not displace the usual CS definition of union. It is a load-bearing definition.
  A tagged union is so called because it is a union (overlapping memory) with a field (a tag) containing a discriminator (aka discriminated union). An untagged union is a union without this tag. These terms didn't come out of nowhere. They are transparent descriptors of what's happening.
  It doesn't matter whether Haskell is implemented in C or Ruby, and it doesn't matter whether it's functional, object-oriented, or a DSL for modding Elden Ring, none of this changes what a union is. I'm very supportive of type theories exploring the option space for type constructs, but someone's going to have to implement these somehow. Such as with (tagged) unions. You know, that CS concept we have, with a settled and well understood meaning.
  To be fair to you, I think you're letting a very narrow definition of 'union' shadow a much more usual and common definition of the word. And that's totally fine if that's the definition pertinent to you and your work, but if that's the case, maybe don't go around pontificating about how most people do not understand unions?
  
  9 replies →