← Back to context

Comment by kazinator

1 day ago

Let's drop the word "program" and use something else, like "project", since the word "program" is normative in ISO C.

The "project" is all the files going into a program supplied other than by the implementation.

C programs can contain #include directives. Those #include directives can be satisfied in one of three ways: they can reference a standard header which is specified by ISO C and hence effectively built into the hosted language, such as <stdio.h>.

C programs can #include a file from the project. For instance someone's "stack.c" includes "stack.h". So yes, there is an external file set (the project) which can have header files.

C programs can also #include something which is neither of the above. That something might be not found (constraint violation). Or it might be found (the implementation provides it). For instance <sys/mmap.h>: not in your project, not in ISO C.

My fictitious <winkle.h> falls into this category. (It deliberately doesn't look like a common platform-specific header coming from any well-known implementation---but that doesn't matter to the point).

> Or are you just maybe saying that the developer's file system isn't part of the ISO C standard, so this whole textual inclusion process is by some meaning not defined by the standard?

Of course, it isn't, no I'm not saying that. The C standard gives requirements as to how a program (project part and other) is processed by the implementation, including all the translation phases that include preprocessing.

To understand what the requirements are, we must consider the content of the program. We know what the content is of the project parts: that's in our files. We (usually indirectly) know the content of the standard headers, from the standard; we ensure that we have met the rules regarding their correct use and what we may or may not rely on coming form them.

We don't know the content of successfully included headers that don't come from our project or from ISO C; or, rather, we don't know that content just from knowing ISO C and our project. In ISO C, we can't find any requirements as to what is supposed to be there, and we can't find it in our project either.

If we peek into the implementation to see what #include <winkle.h> is doing (and such a peeking is usually possible), we are effectively looking at a document, and then if we infer from that document what the behavior will be, it is a documented extension --- standing in the same place as what ISO C calls undefined behavior. Alternatively, we could look to actual documentation. E.g. POSIX tells us what is in <fcntl.h> without us having to look for the file and analyze the tokens. When we use it we have "POSIX-defined" behavior.

#include <winkle.h> is in the same category of thing as __asm__ __volatile__ or __int128_t or what have you.

#include <winkle.h> could contain the token __wipe_current_directory_at_compile_time which the accompanying compiler understands and executes as soon as it parses the token. Or __make_demons_fly_out_of_nose. :)

Do you see the point? When you include a nonstandard header that is not coming from your project, and the include succeeds, anything can happen. ISO C no longer dictates the requirements as to what the behavior will be. Something unexpected can happen, still at translation time.

Now headers like <windows.h> or <unistd.h> are exactly like <winkle.h>: same undefined behavior.

> The "project" is all the files going into a program supplied other than by the implementation.

Most of my most recent comment is addressing the possibility that you meant this.

As I said, there is no such concept to the compiler. It isn't passed any list of files that could be included with #includr, only the .c files actually being compiled, and the directories containing includable files.

The fact that your IDE shows project files is an illusion. Any header files shown there are not treated differently by the compiler/preprocessor to any others. They can't be, because it's not told about them!

It's even possible to add header files to your IDE's project that are not in the include path, and then they wouldn't be picked up by #include. That's how irrelevant project files are to #include.

  • There is no "compiler", "IDE" or "include path" in the wording of the ISO C standard. A set of files is somehow presented to the implementation in a way that is not specified. Needless to say, a file that is included like "globals.h" but is not the base file of a translation unit will not be indicated to the implementation as the base of a translation unit. Nevertheless it has to be somehow present, if it is required.

    It doesn't seem as if you're engaging with the standard-based point I've been making, in spite of detailed elaboration.

    > Any header files shown there are not treated differently by the compiler/preprocessor to any others.

    This is absolutely false. Headers which are part of the implementation, such as standard-defined headers like <stdlib.h> need not be implemented as files. When the implementation processes #include <stdlib.h>, it just has to flip an internal switch which makes certain identifiers appear in their respective scopes as required.

    For that reason, if an implementation provides <winkle.h>, there need not be such a file anywhere in its installation.

    • I only discussed things like include directories and IDEs, which are not part of the standard, because I am trying in good faith to understand how you could have come to your position. There is nothing in the standard like the "set of files is somehow presented to the implementation" (in a sense that includes header files) so I reasoned that maybe you were thinking of something outside the standard.

      Instead, the standard says that the include directive:

      > searches a sequence of implementation-defined places for a header ... and causes the replacement of that directive by the entire contents of the header.

      (Note that it talks about simply substituting in text, not anything more magical, but that's digressing.)

      It's careful to say "places" rather than "directories" to avoid the requirement that there's an actual file system, but the idea is the same. You don't pass the implementation every individual file that might need to be included, you pass in the places that hold them and a way to search them with a name.

      Maybe you were confused by that part of the standard you quoted in an earlier comment.

      One part of that says "The text of the program is kept in units called source files, (or preprocessing files) in this document." But the "source files" aren't really relevant to the include directive – those are the top-level files being compiled (what you've called "base files").

      The next sentence you quoted says "A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit." But "all the headers" here is just referring to files that have been found by the search mechanism referred to above, not some explicit list.

      1 reply →