Comment by MrMcCall

4 months ago

> The library is something like 15 years old.

It's not clear from the docs, but, yeah, I suspected that might be the case. Thanks.

> I'm not sure what you mean by this - they're signed integers. Maybe you just mean that there aren't unsigned ints in the stdlib?

Yes, that's what I mean. And doesn't that mean that it's fully unsuitable for systems programming, as this entire topic is focused on?

> The "one bit is reserved" is only true for the `int` type (which varies in size depending on the runtime between 31 and 63 bits).

I don't get it. What is it reserved for then, if the int size is determined when the runtime is built? How can that possibly affect the runtime use of ints? Or is any build of an OCaml program able to target (at compile-time) either 32- or 64-bit targets, or does it mean that an OCaml program build result is always a single format that will adapt at runtime to being in either environment?

Once again, I don't see how any of this is suitable for systems programming. Knowing one's runtime details is intrinsic at design-time for dealing with systems-level semantics, by my understanding.

> Building OCaml without a proper build system

But I don't want to build the programming language, I want to use it. Sure, I can recompile gcc if I need to, but that shouldn't be a part of my dev process for building software that uses gcc, IMO.

It looks to me like JaneStreet has taken over OCaml and added a ton of apparatus to facilitate their various uses of it. Of course, I admit that I am very specific and focused on small, tightly-defined software, so multi-target, 3rd-party utilizing software systems are not of interest to me.

It looks to me like OCaml's intrinsic install is designed to facilitate far more advanced features than I care to use, and that looks like those features make it a very ill-suited choice for a systems programming language, where concise, straightforward semantics will win the day for long-term success.

Once again, it looks like we're all basically forced to fall back to C for systems code, even if our bright-eyed bushy tails can dream of nicer ways of getting the job done.

Thanks for your patient and excellent help on this topic.

3 comments

MrMcCall

thedufer 4 months ago

> I don't get it. What is it reserved for then, if the int size is determined when the runtime is built? How can that possibly affect the runtime use of ints?

Types are fully erased after compilation of an OCaml program. However, the GC still needs to know things about the data it is looking at - for example, whether a given value is a pointer (and thus needs to be followed when resolving liveness questions) or is plain data. Values of type `int` can be stored right alongside pointers because they're distinguishable - the lowest bit is always 0 for pointers (this is free by way of memory alignment) and 1 for ints (this is the 1 bit ints give up - much usage of ints involves some shifting to keep this property without getting the wrong values).

Other types of data (such as Int64s, strings, etc) can only be handled (at least at function boundaries) by way of a pointer, regardless of whether they fit in, say, a register. Then the whole block that the pointer points to is tagged as being all data, so the GC knows there are no pointers to look for in it.

> Or is any build of an OCaml program able to target (at compile-time) either 32- or 64-bit targets, or does it mean that an OCaml program build result is always a single format that will adapt at runtime to being in either environment?

To be clear, you have to choose at build time what you're targeting, and the integer sized is part of that target specification (most processor architectures these days are 64-bit, for example, but compilation to javascript treats javascript as a 32-bit platform, and of course there's still support for various 32-bit architectures).

> Knowing one's runtime details is intrinsic at design-time for dealing with systems-level semantics, by my understanding.

Doesn't this mean that C can't be used for systems programming? You don't know the size of `int` there, either.

> But I don't want to build the programming language, I want to use it.

I meant building OCaml code, not the compiler.

MrMcCall 4 months ago
Thanks for the fantastic explanation for how ints are handled in OCaml, but I've got to say that having the low bit be the flag is a strange design decision, IMO, but I understand that aligning the pointers will make the low bit or two irrelevant for them. But, oh!, the poor ints.
All this said, thanks for putting to bed, once and for all, any notion anyone should have that OCaml can be used as a systems language. Yikes!
> Doesn't this mean that C can't be used for systems programming? You don't know the size of `int` there, either.
You know that at compile time, surely, when you set the build target, no? Even the pointer sizes. Besides, after years of C programming, I got to where I never used the nonspecific versions; if I wanted 64-bits unsigned, I would specifically typedef them at the top, and then there's no ambiguity because I specifically declared all vars. (You can see how I did the same thing in F# at the bottom of this reply.)
It makes working with printf much less problematic, where things can easily go awry. Anyway, I want my semantics to percolate down pyramid-style from a small set of definitions into larger and larger areas of dependence, but cleanly and clearly.
Sure, DEFINEs can let you do transparent multi-targetting, but it ends up being very brittle, and the bugs are insidious.
Thanks for your excellence. It's been a joy learning from you here.
---
As an aside, here's a small part of my defs section from the final iteration of my F# base libs, where I created an alias for the various .NET types for standard use in my code:
type tI4s = System.Int32 type tI1s = System.SByte type tI2s = System.Int16 type tI8s = System.Int64 type tI1u = System.Byte type tI2u = System.UInt16 type tI4u = System.UInt32 type tI8u = System.UInt64
Why risk relying on implicit definitions (or inconsistent F# team alias naming conventions) when, instead, everything can be explicity declared and thus unambiguous? (It's really helpful for syscall interop declarations, as I remember it from so many years ago). Plus, it's far more terse, and .NET not being able to compile to a 64-bit executable (IIRC) made it simpler than C/C++'s two kinds of executable targets.
- thedufer 4 months ago
  
  > But, oh!, the poor ints.
  Empirically this is a rather low cost. IIRC, the extra ops add less than a cycle per arithmetic operation, due to amortizing them over multiple operations and clean pipelining (and also things like shifts just being really cheap).
  But yes, there are certainly applications where we almost exclusively use Int64 or Int32 rather than the primary int type, if you need exactly that many bits.
  > You know that at compile time, surely, when you set the build target, no?
  Well, that's true of OCaml as well.
  This is ultimately a difference of opinion - I think that the cost of installing a single extra library to get ints of various widths/signedness would be worth the advantage of eliminating nearly all memory errors (and various other advantages of a higher-level language).
  The main carveout I would agree with is any case where you absolutely need strict memory bounds - it's not clear to me how you'd satisfy this with any GC'd language, since the GC behavior is ultimately somewhat chaotic.