Dependent types and how to get rid of them
3 months ago (chadnauseam.com)
Recent and related: Why don't you use dependent types? - https://news.ycombinator.com/item?id=45790827 (116 comments)
3 months ago (chadnauseam.com)
Recent and related: Why don't you use dependent types? - https://news.ycombinator.com/item?id=45790827 (116 comments)
I read the first post and thought someone should at the very least post the lambda cube. This isn't my area of expertise, since I've done very little with dependent types (staying firmly on the left side of the cube), but it outlines some useful categories of type systems here.
https://en.wikipedia.org/wiki/Lambda_cube
I wrote a small post on that: https://azdavis.net/posts/lambda-cube/
Hope it’s helpful!
Cool that you worked on an implementation of the CoC.
I've been somewhat wanting to go back and revisit ATAPL and read chapter 2 on this subject: https://www.cis.upenn.edu/~bcpierce/attapl/frontmatter.pdf
That is a good post. I've linked to it from mine!
2 replies →
I see we need to add special syntax to the signature for dependent type variables.
If you take Zig, it's `comptime` parameters are kind of similar. They can be used to create functions that return types or whose output type depends on the inputs, etc. It seems to fulfil the three axioms at the start, no? The erasure stuff seems just as relavant.
Can I say that `comptime` is dependent types in imperative clothing? Or is there a missing capability making it strictly weaker?
You could probably say that. AFAIK there isn't a single valid definition of a dependently typed language any more than there is a single valid definition of a functional language.
I usually go with "you can make a dependent pair", which is two terms where the type of the second depends on the value of the first; I think you could do that in Zig with a bit of handwaving around whether the comptime expression really is the first value or not.
What Zig's doing is also, as best I understand it all, the only real path to anything like dependent types in imperative languages: make a separate sub-language for constant expressions that has its own evaluation rules that run at compile time. See also C++ constexpr, or the constants permitted in a Rust static array size parameter (practically a dependent type itself!)
A functional language's basic building block is an expression and terms are built out of composed expressions; normalisation means reducing a term as far as possible, and equality is (handwaving!) when two terms reduce to exactly the same thing. So you can do your dependent terms in the usual source language with the usual evaluation rules, and those terms are freely usable at runtime or compile time, it's all the same code.
An imperative language's basic building block is a memory peek or poke, with programs composed of sequences of peeks and pokes scheduled by control flow operators. It's much less clear what "normalisation" looks like here, so imperative languages need that separate constant-value embedded language.
Zig's sub-language and evaluation rules are exactly the same as the Zig language except that `type` is a first class type and the TypeInfo of any type can be both examined and modified. In Zig, generics are implemented by computing types--most often by a function returning a struct that is specialized to a type argument to the function.
I've been writing some D code and, while D "template" functions can take compile-time arguments that are types, they can only produce types via mixins, which are strings that are injected at the point of invocation and then parsed by the compiler. This gives you all of the power of Zig and then some, but it's not nearly as elegant (it would help a lot if D's token quote construct q{...} had lisp-like quasi-quote functionality). OTOH, I found when writing Zig it could become extremely confusing to have the comptime and runtime languages intermixed, which is often done for loop unrolling and "conditional compilation" -- whether it's a comptime or runtime operation is in many cases a function of whether the variables involved are comptime or runtime types. In D it's generally clearer because comptime operations are distinguished with such things as static if for flow control, enum for declaring comptime variables, foo!(comptime args)(runtime args) for function calls, etc.
1 reply →
I'm not familiar with Zig, but I'm pretty sure comptime is not quite as powerful.
In the post, I didn't give any examples of situations where the input to a dependent function was not known at compile time. However, that was just due to the lack of time when writing the post.
I gave this example:
The difference is that nothing in dependent types requires that the depended-on value `b` is known at compile time. You could pass any Bool to `getValue`, including one that was read as input from the user. It would just be the case that, before using the value returned by `getValue b`, you'd have to check (at runtime) the value of `b`. Only then would you know `pickType b`, which would tell you the type of `getValue b`. My apologies that the post was unclear about this. (I've just updated it.)
If you don’t know at compile time what the type of the value is that is returned, how do you know whether the remainder of the expression is still correct?
The choice is either to have a total function that will be able to deal with every type and check at compile time whether they exist or to defer to the runtime and then throw an exception if the program doesn’t know what to do with the value.
3 replies →
I commented this before but deleted it because I hadn't tested my code properly. Anyway, here is a version that works.
You're right Zig isn't quite as powerful, but you don't need type erasure if you have enough monomorphization. It's a bit annoying to use since you have to capture the comptime version of b inside the inline switch prongs, and get_value does need to take a comptime bool instead of a runtime one. It's functionally the same, except that instead of using the bool to prove the type of the type-erased value is correct and delaying the actual if (b) check to runtime, we're moving it to compile time and instead proving that b has the right value for each case, then generating specialized code.
This does NOT mean that it isn't dependent on user input, the call do_thing(get_user_input()) would absolutely work. do_thing has no compile-time parameters at all.
I don't have the natToString thing because it's a bit more annoying in Zig and would obscure the point.
You could define a function
and change the signature
if you wish.
Perhaps more convincingly, you can use inline for with a similar effect. It's obviously horrible code nobody would ever write but I think it might illustrate the point.
Not really. With dependent types, you can have a function that returns a type that depends on a runtime value; with Zig's comptime, it can only depend on a compile-time value.
Note that a language with dependent types doesn't actually "generate" types at runtime (as Zig does at compile-time). It's really about managing proofs that certain runtime objects are instances of certain types. E.g. you could have a "prime number" type, and functions that compute integers would need to include a proof that their result is a prime number if they want to return an instance of the prime number type.
Using dependent types well can be a lot of work in practice. You basically need to write correctness proofs of arbitrary properties.
Sorry if this is obvious, but do these languages let programmers explicitly assert that a certain, relevant property holds at runtime?
Or do these compilers insist on working these proofs out on there own without the benefit of programmer-supplied assertions?
2 replies →
I am slowly trying to understand dependent types but the explanation is a bit confusing to me as, I understand the mathematical terminology of a function that may return a type, but... Since function types take a value and return a value, they are by definition in another universe from say morphisms that would take a type and return a type.
The same way, I see a value as a ur-element and types as sets of values. So even if there is syntactic sugar around the value <-> type equivalence, I'd naively think that we could instead define some type morphism and that might be more accurate. The value parameter would merely be declared through a type parameter constrained to be a singleton. The same way a ur-element is not a set but a member of set.
Then the question is representation but that could be left as an optimization. Perhaps that it is already what is done.
Example:
type Nine int = {9} And then the rest is generic functions, parameterizable by 9, or actually, Nine.
So nothing too different from a refinement of int.
Basically, 'value' would be a special constraint on a type parameter in normal parametric polymorphism implementations. There would probably be derived constraint information such as size etc...
But I guess, the same issue of "which refinement types can be defined, while keeping everything decidable" remains as an issue.
Also how to handle runtime values? That will require type assertions, just like union types? Or is it only a compile time concept and there is no runtime instantiations. Only some kind of const generics?
A typeof function could be an example of dependent type though? Even at runtime?
Just wondering...
In the style of the linked post, you'd probably define a generic type (well, one of two generic types):
type ExactlyStatic : (0 t: Type) -> (0 v: t) -> Type
type ExactlyRuntime : (0 t: Type) -> (v: t) -> Type
Then you could have the type (ExactlyStatic Int 9) or the type (ExactlyRuntime Int 9).
The difference is that ExactlyStatic Int 9 doesn't expose the value 9 to the runtime, so it would be fully erased, while (ExactlyRuntime Int 9) does.
This means that the runtime representation of the first would be (), and the second would be Int.
> Also how to handle runtime values? That will require type assertions, just like union types?
The compiler doesn't insert any kind of runtime checks that you aren't writing in your code. The difference is that now when you write e.g. `if condition(x) then ABC else DEF` inside of the two expressions, you can obtain a proof that condition(x) is true/false, and propagate this.
Value representations will typically be algebraic-data-type flavored (so, often a tagged union) but you can use erasure to get more efficient representations if needed.
In type theory, all singleton types are isomorphic and have no useful distinguishing characteristics (indeed, this is true of all types of the same cardinality – and even then, comparing cardinalities is always undecidable and thus irrelevant at runtime). So your Nine type doesn’t really make sense, because you may as well just write Unit. In general, there is no amount of introspection into the “internal structure” of a type offered; even though parametricity does not hold in general (unless you postulate anticlassical axioms), all your functions that can run at runtime are required to be parametric.
Being isomorphic is not the same as being identical, or substitutable for one another. Type theory generally distinguishes between isomorphism and definitional equality and only the latter allows literal substitution. So a Nine type with a single constructor is indeed isomorphic to Unit but it's not the same type, it carries different syntactic and semantic meaning and the type system preserves that.
Some other false claims are that type theory does not distinguish types of the same cardinality. Type theory is usually intensional not extensional so two types with the same number of inhabitants can have wildly different structures and this structure can be used in pattern matching and type inference. Cardinality is a set-theoretic notion but most type theories are constructive and syntactic, not purely set-theoretic.
Also parametricity is a property of polymorphic functions, not of all functions in general. It's true that polymorphic code can't depend on the specific structure of its type argument but most type theories don't enforce full parametricity at runtime. Many practical type systems (like Haskell with type classes) break it with ad-hoc polymorphism or runtime types.
6 replies →
How does it become Unit if it is an integer of value 9? Why would cardinalities be undecidable if they are encoded discretely in the type?
For instance, type Nine int = {9} would not be represented as 9. It would probably be a fat pointer. It is not just an int, it would not even have the same operations (9 + 9 is 18 which is an int but is not of type Nine, but that's fine, a fat pointer does not need to share the same set of operations as the value it wraps).
It could be seen as a refinement of int? I am not sure that it can truly be isomorphic? My suspicion was that it would only be somewhat isomorphic at compile time, for type checking, and if there is a mechanism for auto unwrapping of the value?
6 replies →
Hi aatd86! We had a different thread about existence a couple days ago, and I'm just curious -- is English your second language? What's your first language? I'm guessing French, or something from Central Europe.
Thanks for humoring me!
Functions that return types are indeed at a higher level than those that don't.
Values can be seen as Singleton types. The key difference is the universe they live in. In the universe of types, the level one level below is perceived as a value. Similarly, in the universe of kinds, types appear to be values.
> Basically, 'value' would be a special constraint on a type parameter in normal parametric polymorphism implementations
Yes this is a level constraint.
> But I guess, the same issue of "which refinement types can be defined, while keeping everything decidable" remains as an issue.
If you're dealing with fully computable types. Nothing is decidable.
> Also how to handle runtime values? That will require type assertions, just like union types? Or is it only a compile time concept and there is no runtime instantiations. Only some kind of const generics?
A compiler with dependent types is essentially producing programs that are itself embedded with its input. There cannot be a distinction between runtime and compel time. Because in general type checking requires you to be able to run a program. The compilation essentially becomes deciding which parts you want to evaluate and which parts you want to defer until later.
> A typeof function could be an example of dependent type though? Even at runtime?
Typeof is just const.
Typeof: (T : type) -> (x:T) -> Type
Final note: I've written some compilers for toy dependently typed languages. By far dependent typing makes both the language and the compiler easier not harder. This is because Haskell and c++ and other languages with type systems and metaprogramming or generics actually have several languages: the value language that we are familiar with, but also the type language.
In Haskell, this is the class/instance language which is another logic programming language atop the lambda calculus language. This means to write a Haskell compiler you have to write a compiler for Haskell and the logic programming language (which is turing complete btw) that is the type language
Similarly in c++, you have to implement c++ AND the template system which is a functional programming language with incredibly confusing semantics.
On the other hand for a dependently typed language, you just write one language... The types are talked about in the same way as values. The only difficulty is type inference. However, an explicit dependently typed language is actually the easiest typed compiler to write because it's literally just checking for term equality which is very easy! On the other hand with a dependent language, type inference is never going to be perfect so you're a lot freer to make your own heuristic decisions and forego HM-like perfection.
Saying dependent types can be erased is like saying math only contains tautologies. Both are _technically_ true.
This made me realize TypeScript’s conditional types are closer to dependent types than I thought.
Yes. You can give pickType a type in TS. Something like pickType<B extends boolean>(b: B): B extends true ? string : number.
I’d love to read a post explaining how TS conditional types are or are not a form of dependent types. Or, I’d like to understand dependent types well enough to write that post.
I asked GPT-5 about it before writing the above comment and got a pretty interesting answer about how, for example, you can’t write a function that takes a number as an argument and returns an array of that length (length enforced by the type system). You can partly get there with tuple types but you have to enumerate every length explicitly. And it only works if you pass a number literal to the function. If you pass a variable of type number, it can’t infer the particular number it is.
I just tried this, and I couldn't quite get it to work.
``` type PickType<B extends Boolean> = B extends true ? number ? string;
function foo<T extends Boolean>(t: T): PickType<T> { return t ? 42 : "42; } ```
The compiler doesn't seem to get that the type of T is narrowed in the ternary branches, and thus Pick<T> is determined.
The error I get is that a number isn't assignable to a Pick<T>.
> Instead of writing the type directly, we're calling pickType and using the type returned. The typechecker evaluates pickType True at compile time, sees it equals Nat, and then it knows you're saying myNat has type Nat.
But how does the compiler evaluate `pickType True` at compile time? Does the typechecker contain an interpreter for (some subset of) the language? It seems potentially circular for a compiler to invoke functions defined in the code that it's compiling.
Yes, the compiler contains an interpreter for the entire language, not just a subset. It is circular as you rightly point out, but there's no contradiction here.
Typically the interpreter limits itself to only evaluating pure functions or functions that do not perform any kind of IO, but in principle this restriction is not necessary. For example Jai's compiler allows completely arbitrary execution at compile time including even downloading arbitrary data from the Internet and using that data to make decisions about how to compile code.
Thank you for the explanation. This reminds me somewhat of an F# type provider, which can execute arbitrary code at compile time.
https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/ty...
The compilers for Zig, D, Nim, and perhaps other languages contain an interpreter for the whole language that can run at compile time. It's not circular--compiling code and executing code are two quite different things (the interpreter does not contain yet another parser--it works on the AST). You simply can do calculations the results of which can be used in compile-time only contexts such as initializers. The trick in Zig is that the compile-time interpreter supports types as first-class objects, so you can compute types and implement generics this way. In Nim, the interpreter supports computations over the AST. In D, types can be passed as generic arguments, and computed strings can be "mixed in" to the source code and compiled at that point (on-the-fly code generation).
"But we'd like to give a type to this function. In most languages (Haskell, Typescript, C++, Rust, C#, etc.) there is no way to do that.[2] "
Can't you do this with a GADT? With a GADT, the type of the output can be specified by which branch of the GADT the input matches. Seems quite a similar idea.
GADTs let you do similar things in the simplest cases, they just don't work as well for complex constraints.
Like, append: Vector n t -> Vector m t -> Vector (n+m) t. With GADTs you need to manually lift (+) to work on the types, and then provide proofs that it's compatible with the value-level.
Then you often need singletons for the cases that are more complex- if I have a set of integers, and an int x, and I want to constrain that the item is in the set and is also even, the GADT approach looks like:
doThing :: Singleton Int x -> Singleton (Set Int) s -> IsInSet x s -> IsEven x -> SomeOutput
Which looks reasonable, until you remember that the Singleton type has to have a representation which is convenient to prove things about, not a representation that is convenient/efficient to operate on, which means your proofs will be annoying or your function itself will hardly resemble a "naive" untyped implementation.
Dependent types fix this because your values are "just" values, so it would be something like
doThing :: (x : int) -> (s : set int) -> so (is_in s x) -> so (even x) -> SomeOutput
Here, x and s are just plain values, not a weird wrapper, and is_in and event are ordinary (and arbitrary) library functions.
I mean, you can trivially do it with a (non-G) ADT, if you're content with wrapping the return value.
OTOH you can do this in TypeScript easily without wrapping; it could return a union type `number | string`.
Isn't that just saying you can do dependent types in TypeScript easily so long as you're willing to give up type checking?
6 replies →
Well written! I learned stuff. Dependent types was new on me. Cheers.
Off the topic of dependent types, but this is way too hilarious to pass up..
> Imagine if pickType did a network request to a server and then returned whatever type the server said. Sounds like a bad idea!
Fucking .. isn't this what LSPs are?! Finally something functional programming people and I can agree on. That does sound like a bad idea.
I think it's a bit apples and oranges. I was suggesting that compilation itself should probably not involve nondeterministic network requests. If I understand LSP correctly, it only uses network requests to allow your editor to communicate with a binary that provides type information. But the compiler behavior is unchanged. Honestly LSPs seem pretty reasonable to me.
What's the fundamental difference between a syntax highlighter and the frontend of a compiler? I would argue not much, apart from the fact that people are willing to have their syntax highlighter randomly explode, but are much more concerned about their compiler randomly exploding.
15 replies →
No, LSPs return the name/metadata of a concrete type. Dependent typing means that the return type of any given function in your (static) program can depend on a runtime value, e.g. user input... In well-defined ways ofc.
So, you're saying it's outside the scope of an LSP to return information about a dependent type because it's .. not a concrete type? That sounds wrong.
I can make literally any language support dependent types that have struct, enum, switch, and assert. You make a boxed type (tagged union, algebraic datatype, variant, discriminated union, fucking, whatever), and just assert whenever you pack/unpack it on function entry/exit. I do this all the time.
In plain English, my quip boils down to 'why do we tolerate network requests in our syntax highlighters, when we don't tolerate them in our compiler frontends?'
1 reply →
This is easily the stupidest thing I've read on HN in quite a while, and your continued arrogance in the rest of the thread doesn't help. Quite an achievement for a site so full of people who have a direct line from their ass to the comment section. Congratulations!