Comment by thedufer

10 months ago

> I don't get it. What is it reserved for then, if the int size is determined when the runtime is built? How can that possibly affect the runtime use of ints?

Types are fully erased after compilation of an OCaml program. However, the GC still needs to know things about the data it is looking at - for example, whether a given value is a pointer (and thus needs to be followed when resolving liveness questions) or is plain data. Values of type `int` can be stored right alongside pointers because they're distinguishable - the lowest bit is always 0 for pointers (this is free by way of memory alignment) and 1 for ints (this is the 1 bit ints give up - much usage of ints involves some shifting to keep this property without getting the wrong values).

Other types of data (such as Int64s, strings, etc) can only be handled (at least at function boundaries) by way of a pointer, regardless of whether they fit in, say, a register. Then the whole block that the pointer points to is tagged as being all data, so the GC knows there are no pointers to look for in it.

> Or is any build of an OCaml program able to target (at compile-time) either 32- or 64-bit targets, or does it mean that an OCaml program build result is always a single format that will adapt at runtime to being in either environment?

To be clear, you have to choose at build time what you're targeting, and the integer sized is part of that target specification (most processor architectures these days are 64-bit, for example, but compilation to javascript treats javascript as a 32-bit platform, and of course there's still support for various 32-bit architectures).

> Knowing one's runtime details is intrinsic at design-time for dealing with systems-level semantics, by my understanding.

Doesn't this mean that C can't be used for systems programming? You don't know the size of `int` there, either.

> But I don't want to build the programming language, I want to use it.

I meant building OCaml code, not the compiler.

2 comments

thedufer

MrMcCall 10 months ago

Thanks for the fantastic explanation for how ints are handled in OCaml, but I've got to say that having the low bit be the flag is a strange design decision, IMO, but I understand that aligning the pointers will make the low bit or two irrelevant for them. But, oh!, the poor ints.

All this said, thanks for putting to bed, once and for all, any notion anyone should have that OCaml can be used as a systems language. Yikes!

> Doesn't this mean that C can't be used for systems programming? You don't know the size of `int` there, either.

You know that at compile time, surely, when you set the build target, no? Even the pointer sizes. Besides, after years of C programming, I got to where I never used the nonspecific versions; if I wanted 64-bits unsigned, I would specifically typedef them at the top, and then there's no ambiguity because I specifically declared all vars. (You can see how I did the same thing in F# at the bottom of this reply.)

It makes working with printf much less problematic, where things can easily go awry. Anyway, I want my semantics to percolate down pyramid-style from a small set of definitions into larger and larger areas of dependence, but cleanly and clearly.

Sure, DEFINEs can let you do transparent multi-targetting, but it ends up being very brittle, and the bugs are insidious.

Thanks for your excellence. It's been a joy learning from you here.

---

As an aside, here's a small part of my defs section from the final iteration of my F# base libs, where I created an alias for the various .NET types for standard use in my code:

   type tI4s = System.Int32
   type tI1s = System.SByte
   type tI2s = System.Int16
   type tI8s = System.Int64

   type tI1u = System.Byte
   type tI2u = System.UInt16
   type tI4u = System.UInt32
   type tI8u = System.UInt64

Why risk relying on implicit definitions (or inconsistent F# team alias naming conventions) when, instead, everything can be explicity declared and thus unambiguous? (It's really helpful for syscall interop declarations, as I remember it from so many years ago). Plus, it's far more terse, and .NET not being able to compile to a 64-bit executable (IIRC) made it simpler than C/C++'s two kinds of executable targets.

thedufer 10 months ago

> But, oh!, the poor ints.
Empirically this is a rather low cost. IIRC, the extra ops add less than a cycle per arithmetic operation, due to amortizing them over multiple operations and clean pipelining (and also things like shifts just being really cheap).
But yes, there are certainly applications where we almost exclusively use Int64 or Int32 rather than the primary int type, if you need exactly that many bits.
> You know that at compile time, surely, when you set the build target, no?
Well, that's true of OCaml as well.
This is ultimately a difference of opinion - I think that the cost of installing a single extra library to get ints of various widths/signedness would be worth the advantage of eliminating nearly all memory errors (and various other advantages of a higher-level language).
The main carveout I would agree with is any case where you absolutely need strict memory bounds - it's not clear to me how you'd satisfy this with any GC'd language, since the GC behavior is ultimately somewhat chaotic.