Comment by kazinator

1 day ago

I see what you are getting at. Programs consist of materials that are presented to the implementation, and also of materials that come from the implementation.

So what I mean is that no file matching <winkle.h> has been presented as part of the external file set given to the implementation for processsing.

I agree that if such a file is found by the implementation it becomes part of the program, as makes sese and as that word is defined by ISO C, so it is not right terminology to say that the file is not part of the program, yet may be found.

If the inclusion is successful, though, the content of that portion of that program is not defined by ISO C.

It still seems like you have invented some notion of "program" that doesn't really exist. Most suspicious is when you say this:

> So what I mean is that no file matching <winkle.h> has been presented as part of the external file set given to the implementation for processsing.

The thing is, there is no "external file set" that includes header files, so this sentence makes no sense.

Note that when the preprocessor is run, the only inputs are the file being preprocessed (i.e., the .c file) and the list of directories to find include files (called the include path). That's not really part of the ISO standard, but it's almost universal in practice. Then the output of the preprocessor is passed to the compiler, and now it's all one flat file so there isn't even a concept of included files at this point. The object files from compilation are then passed to the linker, which again doesn't care about headers (or indeed the top-level source files). There are more details in practice (especially with libraries) but that's the essence.

I wonder if your confusion is based on seeing header files in some sort of project-like structure in an IDE (like Visual Studio). But those are just there for ease of editing - the compiler (/preprocessor) doesn't know or care which header files are in your IDE's project, it only cares about the directories in the include path. The same applies to CMake targets: you can add include files with target_sources(), but that's just to make them show up in any generated IDE projects; it has no effect on compilation.

Or are you just maybe saying that the developer's file system isn't part of the ISO C standard, so this whole textual inclusion process is by some meaning not defined by the standard? If so, I don't think that matches the conventional meaning of undefined behaviour.

If it's neither of those, could you clarify what exactly you mean by "the external file set given to the implementation for processing"?

  • Let's drop the word "program" and use something else, like "project", since the word "program" is normative in ISO C.

    The "project" is all the files going into a program supplied other than by the implementation.

    C programs can contain #include directives. Those #include directives can be satisfied in one of three ways: they can reference a standard header which is specified by ISO C and hence effectively built into the hosted language, such as <stdio.h>.

    C programs can #include a file from the project. For instance someone's "stack.c" includes "stack.h". So yes, there is an external file set (the project) which can have header files.

    C programs can also #include something which is neither of the above. That something might be not found (constraint violation). Or it might be found (the implementation provides it). For instance <sys/mmap.h>: not in your project, not in ISO C.

    My fictitious <winkle.h> falls into this category. (It deliberately doesn't look like a common platform-specific header coming from any well-known implementation---but that doesn't matter to the point).

    > Or are you just maybe saying that the developer's file system isn't part of the ISO C standard, so this whole textual inclusion process is by some meaning not defined by the standard?

    Of course, it isn't, no I'm not saying that. The C standard gives requirements as to how a program (project part and other) is processed by the implementation, including all the translation phases that include preprocessing.

    To understand what the requirements are, we must consider the content of the program. We know what the content is of the project parts: that's in our files. We (usually indirectly) know the content of the standard headers, from the standard; we ensure that we have met the rules regarding their correct use and what we may or may not rely on coming form them.

    We don't know the content of successfully included headers that don't come from our project or from ISO C; or, rather, we don't know that content just from knowing ISO C and our project. In ISO C, we can't find any requirements as to what is supposed to be there, and we can't find it in our project either.

    If we peek into the implementation to see what #include <winkle.h> is doing (and such a peeking is usually possible), we are effectively looking at a document, and then if we infer from that document what the behavior will be, it is a documented extension --- standing in the same place as what ISO C calls undefined behavior. Alternatively, we could look to actual documentation. E.g. POSIX tells us what is in <fcntl.h> without us having to look for the file and analyze the tokens. When we use it we have "POSIX-defined" behavior.

    #include <winkle.h> is in the same category of thing as __asm__ __volatile__ or __int128_t or what have you.

    #include <winkle.h> could contain the token __wipe_current_directory_at_compile_time which the accompanying compiler understands and executes as soon as it parses the token. Or __make_demons_fly_out_of_nose. :)

    Do you see the point? When you include a nonstandard header that is not coming from your project, and the include succeeds, anything can happen. ISO C no longer dictates the requirements as to what the behavior will be. Something unexpected can happen, still at translation time.

    Now headers like <windows.h> or <unistd.h> are exactly like <winkle.h>: same undefined behavior.

    • > The "project" is all the files going into a program supplied other than by the implementation.

      Most of my most recent comment is addressing the possibility that you meant this.

      As I said, there is no such concept to the compiler. It isn't passed any list of files that could be included with #includr, only the .c files actually being compiled, and the directories containing includable files.

      The fact that your IDE shows project files is an illusion. Any header files shown there are not treated differently by the compiler/preprocessor to any others. They can't be, because it's not told about them!

      It's even possible to add header files to your IDE's project that are not in the include path, and then they wouldn't be picked up by #include. That's how irrelevant project files are to #include.

      2 replies →