← Back to context

Comment by rixed

3 days ago

Just browsed the doc to get the answers to two burning questions, which I will dump here in case it saves some time to others:

  - uses LLVM (so: as portable as LLVM)
  - sadly, does not support tagged enums

Apart from that it adds a few very desirable things, such as introspection and macros.

IMHO the downsides of tagged unions (e.g. what Rust confusingly calls "enums") are big enough that they should only be used rarely if at all in a systems programming language since they're shoehoerning a dynamic type system concept back into an otherwise statically typed language.

A tagged union always needs at least as much memory as the biggest type, but even worse, they nudge the programmer towards 'any-types', which basically moves the type checking from compile-time to run-time, but then why use a statically typed language at all?

And even if they are useful in some rare situations, are the advantages big enough to justify wasting 'syntax surface' instead of rolling your own tagged unions when needed?

  • tagged unions (not enums, sorry) are not a dynamic type system concept. Actually, I would not be able to name a single dynamically typed language that has them.

    As for the memory allocation, I can't see why any object should have the size of the largest alternative. When I do the manual equivalent of a tagged union in C (ie. a struct with a tag followed by a union) I malloc only the required size, and a function receiving a pointer to this object has better not assume any size before looking at the tag. Oh you mean when the object is automatically allocated on the stack, or stored in an array? Yes then, sure. But that's going to be small change if it's on the stack and for the array, well there is no way around it ; if it does not suit your design then have only the tags on the array?

    Tagged unions are a thing, whether the language helps or not. When I program in a language that has them then it's probably a sizeable fraction of all the types I define. I believe they are fundamental to programming, and I'd prefer the language to help with syntax and some basic sanity checks; Like, with a dynamical sizeof that to reads the tag so it's easier to malloc the right amount, or a syntax that makes it impossible to access the wrong field (ie. any lightweight pattern matching will do).

    In other words, I couldn't really figure out the downside you had in mind :)

    • > Actually, I would not be able to name a single dynamically typed language that has them.

      That's because every type in a dynamically typed language is a tagged union ;) For instance in Javascript you need to inspect a variable with 'typeof' to find out if it is a string, a boolean, a number or something else.

      In a dynamically typed language, the runtime system needs to carry information around what type an item actually is, and this is the same thing as the type-tag in a tagged union - and Rust's match is the same sort of runtime type inspection as the typeof in JS, just with slightly different syntax sugar.

      > As for the memory allocation, I can't see why any object should have the size of the largest alternative.

      When you have a Rust enum like this:

          enum Bla {
              AByte(u8),
              AString(String),
              AStruct{ x: i64, y: i64 },
          }
      

      ...then every Bla object is always at least 16 bytes even when the active item is 'AByte' (assuming an empty String also fits into 16 bytes). Plain unions in C have the same problem of course, but those are rarely used (the one thing where unions are really useful in C (not C++!) is to have different views on the same memory).

      > When I program in a language that has them then it's probably a sizeable fraction of all the types I define

      ...IMHO 'almost always sum types' is a serious design smell, it might be ok in 'everything is a reference' languages like Typescript, but that's because you pay for the runtime overhead anyway, no matter if sum types are used or not.

      1 reply →

  • Tagged enums != any type (i.e. runtime casting)

    Tagged enums are everywhere. I am writing a micro kernel in C and how I wish I had tagged enums instead of writing the same boilerplate of

      enum foo_type {
        FOO_POINTER,
        FOO_INT,
        FOO_FLOAT,
      };
    
      struct foo {
        enum foo_type type;
        union {
          void *val_pointer;
          int val_int;
          float val_float;
        };
      };

    • > ...runtime casting...

      ...what else is a select on a tagged union than 'runtime casting' though. You have a single 'sum type' which you don't know what concrete type it actually is at runtime until you look at the tag and 'cast' to the concrete type associated with the tag. The fact that some languages have syntax sugar for the selection doesn't make the runtime overhead magically disappear.

      1 reply →