Comment by quelsolaar

3 years ago

I represent Sweden in the ISO WG14, and I voted for the inclusion of Embed in to C23. Its a good feature. But its not a necessary feature and I think JeanHeyd is wrong in his criticism of the pace of wg14 work. I have found everyone in wg14 to be very hardworking and serious about their work.

Cs main strengthen is its portability and simplicity. Therefore we should be very conservative, and not add anything quickly. There are plenty of languages to choose form if you want a "modern" language with lots of conveniences. If you want a truly portable language there is really only C. And when I say truly, I mean for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.

We are the stewards of this, and the work we put in, while large, is tiny compared to the impact we have. Any change we makes, needs to be addressed by every compiler maintainer. There are millions of lines of code that depend on every part of the standard. A 1% performance loss is millions of tons of CO2 released, and billions in added hardware and energy costs.

In this privileged position, we have to be very mindful of the concerns of our users, and take the time too look at every corner case in detail before adding any new features. If we add something, then people will depend on its behavior, no matter how bad, and we therefor will have great difficulty in fixing it in the future without breaking our users work, so we have to get it right the first time.

> for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.

This seems totally misconceived to me as a basis for standardizing a language in 2022. You are optimizing for the few at the expense of the many.

I get that these strange architectures need a language. Why does it have to be C or C++? They can use a nonstandardized variant of C, but why hobble the language that is 99% used on normal hardware with misfeatures that are justified by trule obscure platforms.

  • It doesn't have to be C, but as of today there is no other option. No one is coming up with new languages with these kinds of features so C it is. People should, but language designers today are more interested in memory safety and clever syntax, than portability.

    I would like to caution you against thinking that these weird platforms are old machines from the 60s that only run in museums. For instance many DSPs have 32bit bytes (smallest memory unit that can be individually addressed), so if you have a pair of new fancy noise canceling headphones, then its not unlikely you are wearing a platform like that on your head everyday.

    • Unusual platforms like DSPs usually have specific (usually proprietary) toolchains. Why can't those platforms implement extensions to support 32-bit bytes? Why must everyone else support them? In practice ~no C code is portable to machines with 32-bit bytes. That's okay! You don't choose a DSP to run general purpose code. You choose it to run DSP code, usually written for a specific purpose, often in assembly.

      10 replies →

    • Perhaps Carbon is the first in a series of new low level languages that free us from the impossible tensions of C/C++ having to be all things to all (low level) programmers.

      I would love a new language for implementing high level languages. I've worked on several of these projects and we use mostly unstandardized dialects of C++ and it's really not fit for purpose.

      7 replies →

    • > It doesn't have to be C, but as of today there is no other option

      Isn’t C99 an option? Why can’t more advanced things go into newer C and people who genuinely need something more basic can use C99.

      4 replies →

    • If it were to focus on stability, it would probably be LLVM IR. That said, there's plenty of C++ being written for these applications. And Ada.

      > so if you have a pair of new fancy noise canceling headphones, then its not unlikely you are wearing a platform like that on your head everyday.

      Chip shortage aside, the likelihood of these devices using obscure hardware like discrete DSPs is going down as cheaper low power architectures are becoming commoditized.

      4 replies →

  • > This seems totally misconceived to me as a basis for standardizing a language in 2022. You are optimizing for the few at the expense of the many.

    Sure, but it's the same line of reasoning that made C relevant in the first place, and keeps it relevant today - some library your dad wrote for a PDP-whatever is still usable today on your laptop running Windows 10.

    Because it's antiquated, it's also extremely easy to support, and to port to new and/or exotic platforms.

    • The library my dad wrote (lol) for the PDP-11 is probably full of undefined behaviour and won't work now that optimizers are using any gap in the standard to miscompile code.

      13 replies →

    • > PDP-whatever is still usable today on your laptop running Windows 10

      No, it isn't. Go on. Go ahead and try

      See it break in a million weird ways. (Or, for a start, it will have the K&R C format, which is a pain to maintain)

      "If your computer doesn't have 8-bit bytes" at this day and age? It belongs in a dumpster, sorry.

      (I think the only "modern" arch that does this is PIC, and even only for program data - where you're not running anything "officially" C89 or later)

      2 replies →

  • C is pretty much the only language in common use for programming microcontrollers. Microntrollers seldomly have filesystems. To break the language on systems without filesystems or terminals means to break the software of pretty much every electronics manufacturer out there.

  • As the GP post comments, if you want those features there are plenty of other languages to choose from.

    I don’t even like programming in C but I respect what the committee is trying to do, and yes I do sometimes write C code.

    • I'll flip that around if you want to serve on a language standards commit there are a lot of other languages to choose from. Why be on the C standards committee with the express purpose of blocking progress?

      2 replies →

  • I would say that one should be pretty cautious when baking in assumptions snouty such a fleeting thing as hardware into such a lasting thing as a language.

    C itself carries a lot of assumptions about computer architecture from the PDP-9 / PDP-11 era, and this does hold current hardware back a bit: see how well the cool nonstandard and fast Cell CPU fared.

    A language standard should assume as little about the hardware as possible, while also, ideally, allowing to describe properties of the hardware somehow. C tries hard, but the problem is not easy at all.

  • It’s worse—-almost all of them already use a nonstandard variant of C. The committee is bending over backwards to accommodate them, but they literally _do not care what the standard says_, so this doesn’t even benefit them. Most will keep using a busted C89 toolchain with a haphazard mix of extensions no matter what the standard does.

    • Even if they fork the language, standard still provides the common baseline for all, which is useful.

  • This is purely compiler side and usually those esoteric hosts are not running the compiler, being cross compiled but cross compiled, aren't they?

This reasoning has always rung mostly hollow for compiler features (#embed, typeof) rather than true language features (VLAs, closures).

Modern toolchains must exist for marginal systems. It's understandable to want to write code for a machine from 1975, or a bespoke MCU, on a modern Thinkpad. It is not necessary to support a modern compiler running on the machine from 1975 / bespoke MCU. You might as well argue against readable diagnostic messages because some system out there might not be able to print them!

  • I could also see this, though perhaps it's a step too far for C, applying to Unicode encoding of source files.

    The 1970s mainframe this program will run on has no idea that Unicode exists. Fine. But, the compiler I'm using, which must have been written in the future after this was standardised, definitely does know that Unicode exists. So let's just agree that the program's source code is always UTF-8 and have done with it.

    Jason Turner has a talk where the big reveal is, the reason the slides were all retro-looking was that they were rendered in real time on a Commodore 64. The program to do that was written in modern C++ and obviously can't be compiled on a Commodore 64 but it doesn't need to be, the C64 just needs to run the program.

    • This seems a step too far for me. Compatibility with existing source files which may not be trivial to migrate does also matter. (Well, except for `auto`, C23 was right to fuck with that.) At the very least you'll need flags that mean "do whatever you did before".

      5 replies →

> And when I say truly, I mean for platforms without file systems

Are we're really talking about compiling on such platforms? And if that's the case, how would #include work but not #embed?

  • No, I'm mainly talking about targeting. My point is not so much about embed, but rather that, almost anything you assume you think you know about how computers work isn't necessarily true, because C targets such a wide group of platforms. Almost always when some one raises a question along the line of "No platform has ever done that right?", some one knows of a platform that has done that, and it turns out has very good reasons for doing that.

    For this reason, everything is much more complicated then you first think. For me joining the WG14 has been an amazing opportunity to learn the depths of the language. C is not big but it is incredibly deep. The answer to "Why does C not just do X?" is almost always far more complicated and thought through than the one thinks.

    Everyone in the wg14 who has been around for a while, knows this, and therefore assumes that even the simplest addition will cause problems, even if they cant come up with a reason why.

    • Yeah, but then I have to side with the author - how could a compile time only feature which doesn't even introduce new language semantics possibly be affected by the multitude of build targets?

      Unless "it's more complicated than you think" is the catchall answer to any and all proposals for new language features. In which case, how to make progress at all?

      Also, I find the point about the language being "truly portable" a bit ironic, considering the whole rationale of #embed was that the use case of "embed large chunks of binary data in the executable" was completely non-portable and required adding significant complexity to the build scripts if you were targeting multiple platforms.

      It's easy to make a language portable on paper if you simply declare the non-portable parts to not be your responsibility.

      > Everyone in the wg14 who has been around for a while, knows this, and therefore assumes that even the simplest addition will cause problems, even if they cant come up with a reason why.

      That's not something to be proud of.

      23 replies →

    • I was on X3J11, the ANSI committee that created the original C standard and my experience was similar. It was a great opportunity to learn C at depth and get an understanding of many of the subtle details. We rejected a great many suggestions because our mandate was to standardize existing practice, address some problem areas, and not get too creative. (We occasionally did get too creative. The less said about noalias the better.)

      1 reply →

    • Maybe you can answer a question I have: what companies are still supporting C compilers for sign-magnitude and 1s-complement machines today? I've been programming for almost 40 years now, and I have never come across any machine that is sign-magnitude or 1s-complement (I have encountered real analog computers---a decent sized one too---about 9' (3m) long, 6' (2m) high, and about 3' (1m) deep, requiring hundreds of patch cables to program).

"""Codify existing practice to address evident deficiencies. Only those concepts that have some prior art should be accepted. (Prior art may come from implementations of languages other than C.) Unless some proposed new feature addresses an evident deficiency that is actually felt by more than a few C programmers, no new inventions should be entertained."""

Source: Rationale for International Standard — Programming Languages — C https://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.1...

I don't know if this rationale is still followed, but I think it applies here. We need to be cautious when adding new features to C.

  • well, basic string support would be fine, wouldn't it? the C standard still having no proper string library for decades didn't harm its popularity, but still.

    you cannot find non-normalized substrings (strings are Unicode nowadays), utf-8 is unsupported. coreutils and almost all tools don't have proper string (=Unicode) support.

> where NULL isn't on address 0

Isn't there literally a single GPU for which it is true?

Asking because everytime this surfaces, someone inevitably asks for an example, and the only example I've seen over the years was of one specific (Nvidia?) GPU that uses NULL of 0xFFFFFFFA (or something similar).

That is, do you know how common it is for NULL to not be 0?

People who call C simple have some weird definition of simple. How many C programs contain UB or are pure UB? Probably over 95%+. Language's not simple at all.

  • A straight razor is simple and that's why it's the easiest to cut yourself with. An electric razor is much safer precisely because much engineering went into its creation.

Thank you for your post!

Thank you especially for reminding everybody that programming is much more than web programming and information systems.

  • Thank you,

    Its also worth remembering that a lot of higher level languages have runtimes / VMs are implemented in C. Web applications rely heavily on databases, java script VM, network-stacks, system calls and operating system features, all of which are impemented in C.

    If you are a software developer and want to do something about climate change, consider becomming a compiler engineer. If you manage to get a couple of tenths of a percent performance increase in one of the big compilers during your career, you will have materially impacted global warming. Compiler engineers are the unsung heroes of software engineering.

    • > If you manage to get a couple of tenths of a percent performance increase in one of the big compilers during your career, you will have materially impacted global warming.

      I've heard this kind of claim a number of times and I think it's more complicated than the crude statistical measurement makes it sound. Personally, I think that most programs are not run frequently enough to matter from an emissions perspective. For programs that are, like ML training programs, users will just train more data if the algorithms are faster so most energy efficiencies will get wiped out by the increased usage.

      Even if that theory is wrong, what if there is a language that is 10% better than C for 95% of common C use cases? Wouldn't it be better for compiler engineers to focus on developing that language than micro-optimizing C?

    • No JavaScript VM is implemented in C. They are all written in a language that's a bit like C++ but has no exceptions and relies on lots of compiler behaviour that is not defined by the C++ standard.

      12 replies →

How would such a platform without file systems handle #include?

Reading further, I don't think this was ever addressed when someone else brought it up. I cannot for the life of me imagine a system where #include works but #embed doesn't. Again, it's fine if some systems have non-standard subsets of the C standard....why hobble the actual standard for code which can be compiled on systems where you have a filesystem (that will handle #include by the way) for the systems without filesystems?

  • > How would such a platform without file systems handle #include?

    I don't think it would, you'd cross-compile for it on a platform with a file system. I think the parent poster's point was that C is the only option for some ultra low resources platforms and that a conservative approach should be taken to add new features in general. I don't think they were saying that specifically that not having a filesystem is problematic for this particular inclusion.

  • include is with regards to the source platform, not the target platform la, you (generally) need a filesystem to compile, but you don't need a filesystem to run what you compiled

Congratulations! #embed is a very useful feature.

If I may gripe about C for a bit though. I do truly appreciate C's portability. It's possible to target a very diverse set of architectures and operating systems with a single source. Still, I do wish it would actually embrace each architecture, rather than try to mediate between them. A lot of my gripes with C are due to undefined behaviour which is left as such because of platform differences. I've never seen my program become faster if I remove `-fwrapv -fno-strict-aliasing`, but it has resulted in bugs due to compiler optimisations. I really wish by default "undefined behaviour" would become "platform-specific behaviour", with an officially blessed way to tell the compiler it can perform farther optimisations based on data guarantees.

C occupies a very pleasant niche where it lets you write software for the actual hardware, rather than for a VM, while still being high level enough to allow for expressiveness in algorithms and program organisation. I just wish by default every syntactically valid program would also be a well-defined program, because the alternative we have now makes it really hard to reason about and prove program correctness (i.e. that it does what you think it does).

Thanks for your work on the C standard. Any changes that are made will remain forever, so I'm glad the committee takes this seriously.

I'm curious what you think of UB from a standard perspective --- were things left undefined and not just implementation-defined because there was simply so much diversity in existing and possibly future implementations that specifying any requirements would be unnecessarily constraining? I can hardly believe that it was done to encourage compiler writers to do crazy nonsensical things without regard for behaving "in a documented manner characteristic of the environment" which seems like the original intent, yet that's what seems to have actually happened.

  • >I'm curious what you think of UB from a standard perspective

    I think a lot about that! I'm a member of the UB study group and the lead author of a Technical Report we hope to release on UB.

    In short, "Undefined behavior" is poorly named. It should have been called "Things compilers can assume the program wont do". With what we call "assumed absence of UB" compilers can and do do a lot of clever things.

    Until we get the official TR out, you may find I made a video on the subject interesting:

    https://www.youtube.com/watch?v=w3_e9vZj7D8

> And when I say truly, I mean for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.

Genuine question: why do we want these platforms to live, rather than to be forced to die? They sound awful.

I understand retrocomputing, legacy mainframes, etc; but 99% of that work is done in non-portable assembler and/or some flavor of BASIC; not in C.

  • May of these platforms are micro controllers, DSPs or other programmable hardware, that are in every device now a days, so its not retro, its very much current technology.

    • Once again — I can understand wanting to program this hardware, but who's programming it in C, rather than writing directly to the metal in order to squeeze every cycle out of these?

      1 reply →

  • Because they weren't necessarily awful. Because in the future we may discover we need to do "weird" things again, for performance or other reasons.