I represent Sweden in the ISO WG14, and I voted for the inclusion of Embed in to C23. Its a good feature. But its not a necessary feature and I think JeanHeyd is wrong in his criticism of the pace of wg14 work. I have found everyone in wg14 to be very hardworking and serious about their work.
Cs main strengthen is its portability and simplicity. Therefore we should be very conservative, and not add anything quickly. There are plenty of languages to choose form if you want a "modern" language with lots of conveniences. If you want a truly portable language there is really only C. And when I say truly, I mean for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.
We are the stewards of this, and the work we put in, while large, is tiny compared to the impact we have. Any change we makes, needs to be addressed by every compiler maintainer. There are millions of lines of code that depend on every part of the standard. A 1% performance loss is millions of tons of CO2 released, and billions in added hardware and energy costs.
In this privileged position, we have to be very mindful of the concerns of our users, and take the time too look at every corner case in detail before adding any new features. If we add something, then people will depend on its behavior, no matter how bad, and we therefor will have great difficulty in fixing it in the future without breaking our users work, so we have to get it right the first time.
> for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.
This seems totally misconceived to me as a basis for standardizing a language in 2022. You are optimizing for the few at the expense of the many.
I get that these strange architectures need a language. Why does it have to be C or C++? They can use a nonstandardized variant of C, but why hobble the language that is 99% used on normal hardware with misfeatures that are justified by trule obscure platforms.
It doesn't have to be C, but as of today there is no other option. No one is coming up with new languages with these kinds of features so C it is. People should, but language designers today are more interested in memory safety and clever syntax, than portability.
I would like to caution you against thinking that these weird platforms are old machines from the 60s that only run in museums. For instance many DSPs have 32bit bytes (smallest memory unit that can be individually addressed), so if you have a pair of new fancy noise canceling headphones, then its not unlikely you are wearing a platform like that on your head everyday.
> This seems totally misconceived to me as a basis for standardizing a language in 2022. You are optimizing for the few at the expense of the many.
Sure, but it's the same line of reasoning that made C relevant in the first place, and keeps it relevant today - some library your dad wrote for a PDP-whatever is still usable today on your laptop running Windows 10.
Because it's antiquated, it's also extremely easy to support, and to port to new and/or exotic platforms.
C is pretty much the only language in common use for programming microcontrollers. Microntrollers seldomly have filesystems. To break the language on systems without filesystems or terminals means to break the software of pretty much every electronics manufacturer out there.
I would say that one should be pretty cautious when baking in assumptions snouty such a fleeting thing as hardware into such a lasting thing as a language.
C itself carries a lot of assumptions about computer architecture from the PDP-9 / PDP-11 era, and this does hold current hardware back a bit: see how well the cool nonstandard and fast Cell CPU fared.
A language standard should assume as little about the hardware as possible, while also, ideally, allowing to describe properties of the hardware somehow. C tries hard, but the problem is not easy at all.
It’s worse—-almost all of them already use a nonstandard variant of C. The committee is bending over backwards to accommodate them, but they literally _do not care what the standard says_, so this doesn’t even benefit them. Most will keep using a busted C89 toolchain with a haphazard mix of extensions no matter what the standard does.
This reasoning has always rung mostly hollow for compiler features (#embed, typeof) rather than true language features (VLAs, closures).
Modern toolchains must exist for marginal systems. It's understandable to want to write code for a machine from 1975, or a bespoke MCU, on a modern Thinkpad. It is not necessary to support a modern compiler running on the machine from 1975 / bespoke MCU. You might as well argue against readable diagnostic messages because some system out there might not be able to print them!
I could also see this, though perhaps it's a step too far for C, applying to Unicode encoding of source files.
The 1970s mainframe this program will run on has no idea that Unicode exists. Fine. But, the compiler I'm using, which must have been written in the future after this was standardised, definitely does know that Unicode exists. So let's just agree that the program's source code is always UTF-8 and have done with it.
Jason Turner has a talk where the big reveal is, the reason the slides were all retro-looking was that they were rendered in real time on a Commodore 64. The program to do that was written in modern C++ and obviously can't be compiled on a Commodore 64 but it doesn't need to be, the C64 just needs to run the program.
No, I'm mainly talking about targeting. My point is not so much about embed, but rather that, almost anything you assume you think you know about how computers work isn't necessarily true, because C targets such a wide group of platforms. Almost always when some one raises a question along the line of "No platform has ever done that right?", some one knows of a platform that has done that, and it turns out has very good reasons for doing that.
For this reason, everything is much more complicated then you first think. For me joining the WG14 has been an amazing opportunity to learn the depths of the language. C is not big but it is incredibly deep. The answer to "Why does C not just do X?" is almost always far more complicated and thought through than the one thinks.
Everyone in the wg14 who has been around for a while, knows this, and therefore assumes that even the simplest addition will cause problems, even if they cant come up with a reason why.
"""Codify existing practice to address evident deficiencies. Only those concepts that have some prior art should be accepted. (Prior art may come from implementations of languages other than C.) Unless some proposed new feature addresses an evident deficiency that is actually felt by more than a few C programmers, no new inventions should be entertained."""
well, basic string support would be fine, wouldn't it? the C standard still having no proper string library for decades didn't harm its popularity, but still.
you cannot find non-normalized substrings (strings are Unicode nowadays), utf-8 is unsupported. coreutils and almost all tools don't have proper string (=Unicode) support.
Isn't there literally a single GPU for which it is true?
Asking because everytime this surfaces, someone inevitably asks for an example, and the only example I've seen over the years was of one specific (Nvidia?) GPU that uses NULL of 0xFFFFFFFA (or something similar).
That is, do you know how common it is for NULL to not be 0?
There’s a lot of platforms where you might want to do this. If you’re programming baremetal the “address 0” might be a physical address that you expect stuff to exist at, so it might be relevant to use the bit pattern 0xffffffff instead. If you’re targeting a blockchain or WASM VM you may not also not have memory protection to work with, just a linear array of memory. And some machines don’t even have bit patterns for pointers, like say a Lisp machine.
People who call C simple have some weird definition of simple. How many C programs contain UB or are pure UB? Probably over 95%+. Language's not simple at all.
A straight razor is simple and that's why it's the easiest to cut yourself with. An electric razor is much safer precisely because much engineering went into its creation.
Its also worth remembering that a lot of higher level languages have runtimes / VMs are implemented in C. Web applications rely heavily on databases, java script VM, network-stacks, system calls and operating system features, all of which are impemented in C.
If you are a software developer and want to do something about climate change, consider becomming a compiler engineer. If you manage to get a couple of tenths of a percent performance increase in one of the big compilers during your career, you will have materially impacted global warming. Compiler engineers are the unsung heroes of software engineering.
How would such a platform without file systems handle #include?
Reading further, I don't think this was ever addressed when someone else brought it up. I cannot for the life of me imagine a system where #include works but #embed doesn't. Again, it's fine if some systems have non-standard subsets of the C standard....why hobble the actual standard for code which can be compiled on systems where you have a filesystem (that will handle #include by the way) for the systems without filesystems?
> How would such a platform without file systems handle #include?
I don't think it would, you'd cross-compile for it on a platform with a file system. I think the parent poster's point was that C is the only option for some ultra low resources platforms and that a conservative approach should be taken to add new features in general. I don't think they were saying that specifically that not having a filesystem is problematic for this particular inclusion.
include is with regards to the source platform, not the target platform
la, you (generally) need a filesystem to compile, but you don't need a filesystem to run what you compiled
If I may gripe about C for a bit though. I do truly appreciate C's portability. It's possible to target a very diverse set of architectures and operating systems with a single source. Still, I do wish it would actually embrace each architecture, rather than try to mediate between them. A lot of my gripes with C are due to undefined behaviour which is left as such because of platform differences. I've never seen my program become faster if I remove `-fwrapv -fno-strict-aliasing`, but it has resulted in bugs due to compiler optimisations. I really wish by default "undefined behaviour" would become "platform-specific behaviour", with an officially blessed way to tell the compiler it can perform farther optimisations based on data guarantees.
C occupies a very pleasant niche where it lets you write software for the actual hardware, rather than for a VM, while still being high level enough to allow for expressiveness in algorithms and program organisation. I just wish by default every syntactically valid program would also be a well-defined program, because the alternative we have now makes it really hard to reason about and prove program correctness (i.e. that it does what you think it does).
I'm curious what you think of UB from a standard perspective --- were things left undefined and not just implementation-defined because there was simply so much diversity in existing and possibly future implementations that specifying any requirements would be unnecessarily constraining? I can hardly believe that it was done to encourage compiler writers to do crazy nonsensical things without regard for behaving "in a documented manner characteristic of the environment" which seems like the original intent, yet that's what seems to have actually happened.
>I'm curious what you think of UB from a standard perspective
I think a lot about that! I'm a member of the UB study group and the lead author of a Technical Report we hope to release on UB.
In short, "Undefined behavior" is poorly named. It should have been called "Things compilers can assume the program wont do". With what we call "assumed absence of UB" compilers can and do do a lot of clever things.
Until we get the official TR out, you may find I made a video on the subject interesting:
> And when I say truly, I mean for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.
Genuine question: why do we want these platforms to live, rather than to be forced to die? They sound awful.
I understand retrocomputing, legacy mainframes, etc; but 99% of that work is done in non-portable assembler and/or some flavor of BASIC; not in C.
May of these platforms are micro controllers, DSPs or other programmable hardware, that are in every device now a days, so its not retro, its very much current technology.
Enjoy the naysayers if you like! I'm glad someone spent the time and effort to push past them. Bit too late for me - I have moved on to Rust which had support for this from version 1.0.0.
> There's also the standard *nix/BSD utility "xxd".
> Seems like the niche is filled. Or, at least, if you want to claim that
>...do NOT completely fill this evolutionary niche
> This ultimately would encourage a weird sort of resource management philosophy that I think might be damaging in the long run.
> Speaking from experience, it is a tremendously bad idea to bake any resource into a binary.
> I'll point out that this is a non-issue for Qt
applications that can simply use Qt's resources for this sort of business.
(Though credit to Matthew Woehlke, he did point out a solution which is basically identical to #embed)
> I find this useless specially in embedded environments since there should be some processing of the binary data anyway, either before building the application
In fairness there was a decent amount of support. But given the insane amount of negativity around an obviously useful feature I gave up.
I wonder if there was a similar response to the proposal to include `string::starts_with()`...
> > Speaking from experience, it is a tremendously bad idea to bake any resource into a binary.
What a pompous douche whoever wrote that was.
> > This ultimately would encourage a weird sort of resource management philosophy that I think might be damaging in the long run.
So, this might be a valid point, although not enough to reject the feature for. It true that it's a feature that could potentially see over-use and ab-use. But then, so did templates :-P
> told me this form was non-ideal and it was worth voting against (and that they’d want the pure, beautiful C++ version only[1])
I heard about #embed, but I didn't hear about std::embed before. After looking at the proposal, to me it does look a lot better than #embed, because reading binary data and converting it to text, only to then convert it to binary again seems needlessly complex and wasteful. I also don't like that it extends the preprocessor, when IMHO the preprocessor should at worst be left as is, and at best be slowly deprecated in favour of features which compose well with C proper.
Going beyond the gut reaction and moving on to hard data, as you can expect from this design, std::embed of course is faster during compilation than #embed for bigger files (comparable for moderately-sized files, and a bit slower for tiny files).
I'm not a huge fan of C++, but the fact that C++ removed trigraphs in C++17 and that it's generally adding features replacing the preprocessor scores a point with me.
Compilers follow the "as if" principle, they don't have to literally follow the formal rules given by the standard. They could implement #embed by doing as you say, pretty printing out numbers and then parsing them back in again. But that would be an extremely roundabout way to do it, so I doubt anyone will actually do it that way. Unless you're running the compiler in some kind of debugging mode like GCC's -E.
I don’t think the implication is that the C compiler must encode the binary file as a comma-separated integer list and then re-parse it, only act as if it did so.
How would that work? It would need to depend on the grammar of surrounding C code. This directive isn't limited to variable initialisers. You can use it anywhere. So e.g. you can use it inside structure declaration, or between "int main()" and "{". etc. etc. Those will generate errors in subsequent phases, but during preprocessing the compiler doesn't know about it. Then there is also just that:
int main () {
return
#embed "file.bin"
;
}
There are plenty of cases, where it will all behave differently. And if you're going to pretend even more that the preprocessor understands C syntax, then why not just give this job to compiler proper, which actually understands it?
People don't dislike it because they are unaware how helpful it can be. They dislike it because they are aware how hacky, fragile and error-prone it is. They want something more robust than text substitution.
People that don't like it generally have used macros that are more sophisticated than just blindly copy pasting text into your source files and have became aware of how absurd that is.
which obliterate tooling such as IDEs. Of course, this is a contrived example, but the preprocessor is just one big footgun, which offers no benefits over other ways of solving the problems you mentioned, such as constexpr and perhaps additional, currently unimplemented solutions.
This serves the same use as Rust's `include_bytes!` macro, right? Presumably most people just use this feature as a way to avoid having to stuff binary data into a massive array literal, but in our case it's essential because we're actually using it to stuff binaries from earlier in our build step into a binary built later in the build step. Not something you often need, but very handy when you do.
This has different affordances than std::include_bytes! but I agree that if you were writing Rust and had this problem you'd reach for std::include_bytes! and probably not instead think "We should have an equivalent of #embed".
include_bytes! gives you a &'static [u8; N] which for non-Rust programmers means we're making a fixed size array (the size of your file) full of unsigned 8-bit integers (ie bytes) which lives for the life of the program, and we get an immutable reference to it. Rust's arrays know how big they are (so we can ask, now or later) but cannot grow.
#embed gets you a bunch of integers. The as-if rule means your compiler is likely to notice if what you're actually doing is putting those integers into an array of unsigned 8-bit integers and just stick all the file bytes in the array, short cutting what you wrote, but you could reasonably do other things, especially with smaller files.
As the article quotes, in C the lack of standardisation makes this tricky when you want to support more than one compiler, or even when you want to support just one compiler (cf email about the hacks to make it work on GCC with PIE).
> Even among people who control all the cards, they are in many respects fundamentally incapable of imagining a better world or seizing on that opportunity to try and create one, let alone doing so in a timely fashion.
That does sound soul-crushing. Congrats on this achievement!
This is simply wrong. We (the ISO wg14) don't hold the cards, compilers are free to implement what ever they want, users are free to use what ever tools or languages they want.
We exist only as long as we are trusted to be good stewards, and only go forward with the consensus of the wider community.
It's amazing that you and the ISO team are good stewards of the C standard. Thank you for being part of that.
And it can also be true that it was "hell" and "hardly worth it" for the OP to get a new feature added to the language. I believe it was a miserable experience that has him questioning how he spends his time.
Both can be true. Thank you for your efforts. And thank the OP for his efforts too.
> > Even among people who control all the cards, they are in many respects fundamentally incapable of imagining a better world or seizing on that opportunity to try and create one, let alone doing so in a timely fashion.
> This is simply wrong. We (the ISO wg14) don't hold the cards, compilers are free to implement what ever they want, users are free to use what ever tools or languages they want.
This is an incredibly oblivious realization of JeanHeyd's point.
I think in our reality the prerequisite for holding all the cards is the lack of competence in knowing how to improve the world. We've gotten where we are now through sheer force of will of those that are empty handed.
The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.
This reminds me, I'd argue that the explosion of JS frameworks can be mainly blamed on one thing: the lack of an <include src="somemodule.html"> tag. If you have that you basically have 80% of vue.js already natively supported. No clue why this was never added in any fashion. Change my mind.
HTML imports were part of the original concept of Web Components, and I think they were supported in Chrome. If you look up examples of things built with Polymer 1.x, it was used extensively.
It was actually pretty neat, because you could have an HTML file with a template, style, and script section.
Safari rejected the proposal, so it had to get dropped.
But ESM makes it a bit redundant anyway. The end-goal is to allow you to import any kind of asset, not just JS. There have been demos and examples of tools supporting this going back over half a decade at this point.
Not the parent comment, but my personal use case is for rendering a selectable list. The server side would render a static list with fragment links (ex. `#item-10`) and include elements with corresponding IDs, and a `:target` css rule to unhide the element. This would hopefully be paired with lazy loading the include elements.
edit:
My goal is to avoid reloading the page for each selection and rendering all items eagerly. JS frameworks are the only ones that really allow this behavior.
Honestly I'm usually very wary of additions to C, as one of its greatest strengths (to me) is how rather straightforward it is as a language in terms of conceptual simplicity. There just aren't that many big concepts to understand in the language. (On the other hand there's _many_ footguns but that's another issue.)
That said, to me this seems like a great addition to the language. It's very single-purpose in its usage (so it doesn't seem to add much conceptual complexity to the language) and it replaces something genuinely painful (arcane linker hacks). I'm very much looking forward to using this as I often make single-executable programs in C. The only thing that's unfortunate is I'm sure it'll take decades before proprietary embedded toolchains add support for this.
The first commandment of C is: 'writing a naive C compiler should be "reasonable" for a small team or even one individual'. That's getting harder and harder, longer and longer.
I did move from C being "the best compromise" to "the less worse compromise".
I wish we had a "C-like" language, which would kind of be a high-level assembler which: has no integer promotion or implicit casts, has compile-time/runtime casts (without the horrible c++ syntax), has sized primitive types (u64/s64,f32/f64,etc) at its core, has sized literals (42b,12w,123dw,2qw,etc), has no typedef/generic/volatile/restrict/etc well that sort of horrible things, has compile-time and runtime "const"s, and I am forgetting a lot.
From the main issues: the kernel gcc C dialect (roughly speaking, each linux release uses more gcc extensions). Aggressive optimizations can break some code (while programing some hardware for instance).
Maybe I should write assembly, expect RISC-V to be a success, and forget about all of this.
I wish we had something like typed Lua without Lua’s weird quirks (e.g. indexing by 1), designed with performance enhancement and and safety in mind, and with the features you mention.
But like Lua, the base compiler is really small and simple and can be embedded. And it’s “pseudo-interpreted”: ultimately it’s an ahead-of-time language to support things like function declarations after references and proper type checking, but compiling unoptimized is practically instant and you can load new sources at runtime, start a REPL, and do everything else you can with an interpreted language. Now having a simple compiler with all these features may be impossible, so worse-case there is just a simple interpreter, a separate type-checker, and a separate performance-optimized JIT compiler (like Lua and LuaJIT).
Also like Lua and high-level assembly, debugging unoptimized is also really simple and direct. By default, there aren’t optimizations which elide variables, move instructions around, and otherwise clobber the data so the debugger loses information, not even tail-call optimization. Execution is so simple someone will create a reliable record-replay, time-travel debugger which is fast enough you could run it in production, and we can have true in-depth debugging.
Now that i’ve wrote all that I realize this is basically ML. But oCaml still has weird quirks (the object system), SML too honestly, and I doubt their compilers are small and simple enough to be embedded. So maybe a modern ML dialect with a few new features and none of the more confusing things which are in standard ML.
Checkout Nim! It does much of what you describe and its great. The core language is fairly small (not quite lua simple but probably ML comparable). It compiles fast enough that a Nim repl like `inim` is useable to check features and for basic maths, though it requires a C compiler, but TCC [4] works perfectly. Essentially Nim + tcc is pretty close to your description, IMHO. Though I'm not sure TCC supports non-x86 targets.
I've never used it but Nim does support some hot reloading as well [3]. It also has a real VM if you want to run user scripts and has a nice library for it [1]. Its not quite Lua flexible but for a generally compiled language its impressive.
Recently I made a wrapper to embed access to the Nim compilers macros at runtime [2]. It took 3-4 hours probably and still compiles in 10s of seconds despite building in a fair bit of the compiler! It was useful for making a code generator for a serializer format. Though I'm not sure its small enough to live on even beefy m4/m7 microcontrollers. Though I'm tempted to try.
GCC or Clang with all warnings turned on will give you almost what you want. -Wconversion -Wdouble-promotion and 100s of others. A good way to learn about warning flags (apart from reading the docs) is Clang -Weverything, which will give you many, many warnings.
I agree (with a lot of caveats), but a key value of C is that we do not break peoples code and that means that we cant easily remove things. If we do, we create a lot of problems. This makes it very difficult to keep the language as easy to implement as we would like. As a member of the WG14, I intend to propose that we do make this our prime priority going forward.
> I wish we had a "C-like" language, which would kind of be a high-level assembler which: has no integer promotion or implicit casts, has compile-time/runtime casts (without the horrible c++ syntax), has sized primitive types (u64/s64,f32/f64,etc) at its core, has sized literals (42b,12w,123dw,2qw,etc), has no typedef/generic/volatile/restrict/etc well that sort of horrible things, has compile-time and runtime "const"s, and I am forgetting a lot.
Unsafe Rust code I think fits this model better than C does: it relies on sized primitive types, it has support for both wrapping and non-wrapping arithmetic rather than C's quite frankly odd rules here, it has no automatic implicit casts, it has no strict aliasing rules.
The first commandment of C is: 'writing a naive C compiler should be "reasonable" for a small team or even one individual'. That's getting harder and harder, longer and longer.
100% agreed. I've always viewed C as a "bootstrappable" language, in which it is relatively straightforward to write a working compiler (in a lower level language, likely Asm) which can then be used to bring up the rest of an environment. The preprocessor is actually a little more difficult in some respects to get completely correct, and arguably #embed belongs there, so it's debatable whether this feature is actually adding complexity to the core language.
Your wish for a "C-like" language sounds very much like B.
There is so much more to remove: 1 loop statement is enough, loop {}, enum should go away with the likes of typeof, etc.
I wonder if all that makes writing a naive "B+" compiler easier (time/complexity/size) than a plain C compiler. I stay humble since I know removing does not mean easier and faster all the time, the real complexity may be hidden somewhere else.
I’m really amazed at how divisive this one is, and the number of comments here questioning what seems to me a really useful and well-thought-out feature, something I’d have loved to have used many many times over the years.
I guess the heated arguments here help me understand how it could have taken so long to get this standardised, though, so that’s something!
Congratulations and thank you to the OP for doing this, and thanks also for this really interesting (if depressing) view of the process.
This is a really, really good feature and I am so glad it is finally getting standardized. C23 is shaping up to be a very good revision to the C standard. I’m hoping the proposal to allow redeclaration of identical structs gets in as well as you would finally be able to write code using common types without having to coordinate which would allow interoperability between independently written libraries.
Congratulations to the author. Things like this are why I hope Carbon exists. Evolving c++ seems like a dumpster fire, despite whatever compelling arguments about comparability you are going to drop on me.
The issue is that a lot of people just think about languages in a wrong way, which is the whole reason for pointless things like C++ expansions, Carbon, Rust, and stuff like this.
One of the fundamental ideas that people run with in language creation/expansion is "programmer is stupid and/or make mistakes" -> "lets add language features that intercept and control his stupidity/mistakes".
And there is a very valid reason for this - it allows programmers of lesser skill and knowledge base to pick up codebases and develop safe software, which has economic advantages in being able to higher less experienced devs to write software at lower salary points and spend next to no time fixing segfault issues due to complex memory management. The whole reason Java got so popular over C++ was because of its GC - both C++ and Java supported fairly strong typing with classes, but C++ still had a lot of semantics around memory management that had to be taken care of, whereas with Java you just simply don't do anything.
However, people are applying this idea towards lower level languages, because they want the high performance of a compiled language with a whole bunch of features that make writing code as mistake free as possible.
And my challenge to that is this - why not spend the time making just smarter compilers/tooling?
Think about a hypothetical case where Rust gets all the features added to it that people want, and is widely used as the main language over all others. Looking at all the code bases, there will be a lot of common use patterns, a lot of the safety code duplicated over and over in predictable patterns, e.t.c. And you will see these common things added to Rust. Just like with Java a lot of the predictable use patterns got abstracted into widely used libraries like Lombok, Spring, e.t.c, where you don't have to worry about correctness in lieu of using a library. And you essentially will start to move towards more and more stuff being handled for you automagically, which is all part of the compiler/toolchain
In the same way, #embed can be solved by smart compiler. Have a static string that opens a file, and read contents into a buffer that doesn't change? Auto include that file in a binary if you want to target performance rather than executable size. No need for special instruction, just be smart about how you handle an open call, and leave the fine tuning of this to specific compiler options.
And from an economic perspective of ease of use from above, you would have a language like Python which is super easy to pick up and program in, except instead of the interpreter, you would have a compiler that will spit out binaries. Python is already widely adopted primarily of how easy it is to set up and use. Now imagine if you had the option to run a super smart compiler that highlights any potential issues that come with dynamic typing because it understands what you are trying to do, fixes any that it can, and once everything is addressed, it spits out an optimized memory safe executable. With Rust, you code, compile, see you made a mistake somewhere with a reference, fix it, repeat. With this, you would code, compile, fix the mistake somewhere that the compiler warns you about, repeat. No difference.
Focusing on the toolchain also lets you think about integrating features from languages like Coq with provability, where you can focus not only on correctness processing/memory wise, but also "is the output actually correct". I.e, any piece of code for all given input can be specified to have guaranteed bounded output set, which you can integrate into IDE tools to provide you real time feedback on this for you to design the code in a way that avoids things like URL parsing mistakes, which all the languages safety features of Rust won't catch.
As for C, you leave it a version that has a stable, robust ABI, and then anything that you need to support will be delegated to custom tools. That way, in the future where compute will likely be full of specialized ML chips, instead of worrying about writing the frontend to support every feature, you quickly get a notional tool chain made and are able to run existing C code.
I would expect that to produce an error, though. If I had a regular file that was not infinite in size, and I specified the wrong length for the array, I would find it more useful to have the compiler inform me as to the discrepancy rather than truncate my file.
The preprocessor needs to run before the compiler, though, and isn't complex enough to understand the context of the code that it's in. That would be a substantially complex thing to implement.
This will indeed require delaying population of the array to the compilation stage. However it's worth the convenience and the succinctness of the syntax, and it's not that substantially complex to implement.
Interesting. I look forward to this. What I've been doing now to embed a source.png file is something like this, where I generate source code from a file's data:
What about creating object files from raw binary files and then linking against them? That's what I (and of course many others) do for linking textures and shaders into the program. It's a bit ugly though that with this approach you can't generate custom symbol names, at least with the GNU linker.
This #embed feature might be a nice alternative for small files. Well for large files you usually don't even want to store them inside the binary, so the compilation overhead might be miniscule, since the files are, by intention, small.
When I read the introduction of the article - about allowing us to cram anything we want into the binary - I was hoping to see a standard way to disable optimizations (When the compiler deletes your code and you don't even notice).
You reminded me of Bethesda Softworks games, which always seem to have 1GB+ executables for some reason. I hope it isn't all code. Maybe they embed the most important assets that will always need to be loaded.
My guess is that the files are not truly embedded as it would require to load the entire file in memory before running the application, which seems wasteful.
More likely, the actual executable is only a small part of the file which accesses the rest of the file as an archive, like a self-extracting zip. There may also be some DRM trickery going on.
Off the top of my head, I think there's some niche use in embedding shaders so that they don't need to be stored as strings (no IDE support) or read at runtime (slower performance).
There are a lot of use cases for baking binary data directly into the program, especially in embedded applications. For instance, if you are writing a bootloader for a device that has some kind of display you might want to include a splash screen, or a font to be able to show error messages before a filesystem or an external storage medium is initialized. Similarly, on a microcontroller with no external storage at all you need to embed all your assets into the binary; the current way to do that is to either use whatever non-standard tools the manufacturer's proprietary toolchain provides, or to use xxd to (inefficiently) generate a huge C source file from the contents of the binary file. Both require custom build steps and neither is ideal.
You can get some IDE support with a simple preprocessor macro[1].
It's a crutch, but at least you don't need to stuff the shader into multiple "strings" or have string continuations (\) at the end of every line. Plus you get some syntax highlighting from the embedding language. I.e. the shader is highlighted as C code, which for the most part seems to be close enough.
nullptr since we have type detection now, and NULL mustn't be a pointer. auto, because otherwise everybody would create their own hacky auto using the new typeof.
Yes, but it's linker-specific and non-portable. It can also come with some annoying limitations, like having to separately provide the data size of each symbol. In some cases this might be introspectable, but again comes at the expense of portability.
ELF-based variants of the IAR toolchain, for example, provide a means of directly embedding a file as an ELF symbol, but without the size information being directly accessible.
GNU ld and LLVM lld do not provide any embedding functionality at all (as far as I can see). You would have to generate a custom object file with some generated C or ASM encoding the binary content.
MSVC link.exe doesn't support this either, but there is the "resource compiler" to embed binary bits and link them in so they can be retrieved at runtime.
Having a universal and portable mechanism which works everywhere will be a great benefit. I'll be using it for compiled or text shaders, compiled or text lua scripts, small graphics, fonts and all sorts.
This article[1] shows how you can use GCC toolchain along with objcopy to create an object file from a binary blob, link it, and use the data within in your own code.
The article address this directly. If you're only targeting one platform then this is reasonably easy (albeit still not as easy as #embed), but if you need to be portable then it becomes a nightmare of multiple proprietary methods.
Sure, but to add binary data to any executable on any platform is more involved.
As an example, see [1]. That will turn any file into a C file with a C array, and I use it to embed a math library ([2]) into the executable so that the executable does not have to depend on an external file.
5 years ago I wrote a small python script [1] to help me solve "the same problem".
It reads files in a folder and generates an header file containing the files' data and filenames.
Is very simple and was to helping me on a job. It has limitations, don't be too hard on me :)
This is a cool feature and I'll likely be using it in the years to come. However, the posix standard command xxd and its -i option can achieve this capability portably today.
It will be useful to achieve it directly in the preprocessor however. I wonder how quickly can it be added to cpp?
> It's also only suitable for tiny files: compile time and RAM requirements will blow up once you go beyond a couple of megabytes.
Do you know what makes it so? Is there a technical argument why the compiler could do better, except maybe for xxd not being specifically optimized for this use case?
The article spends a fair bit of time discussing the build speed and memory use problems with that approach. Like, the benchmark results [0] linked to from this post literally have xxd as one of the rows. It's not a viable option for embedding megabytes of data.
Scary, it's as if the preprocessor has become type-aware. I guess I better don't imagine the result of the preprocessing to look similar to and following the same rules as something I would have written by hand.
This might make manual inspection of the preprocessed file a bit painful.
One particular scenario that people have highlighted is developing for an embedded system that doesn't have any storage except flash memory, and no filesystem. In this kind of system, embedding static resources in the executable is the only reasonable option you have.
> “Touch grass”, some people liked to tell me. “Go outside”, they said (like I would in the middle of Yet Another Gotdang COVID-19 Spike). My dude, I’ve literally gotten company snail mail, and it wasn’t a legal notice or some shenanigans like that! Holy cow, real paper in a real envelope, shipped through German Speed Mail!! This letter alone probably increases my Boomer Cred™ by at least 50; who needs Outside anymore after something like this?
Touch grass indeed. Sure, #embed is a nice feature, but this self-indulgent writing style I can’t stand.
C89 is where C should've stayed at. If you need to convert a file to a buffer and stick that somewhere in your translation unit, use a build system. Don't fuck with C.
> "Did you read the snail mail letter from someone who does just that?"
I did. The author struggled embedding files into their executables with makefiles. We don't know anything else beyond that. So what?
People also struggle with memory management in C, an arguably much more difficult and widespread problem. Should we introduce a garbage collector into the C spec? How about we just pull in libsodium into the C standard library because people struggle with getting cryptography right?
OP mentions #embed was a multi-year long uphill battle, with a lot of convincing needed at every turn. That in itself is enough proof that people aren't in clear agreement over there being a single "right" solution. Hence, leave this task to bespoke build systems and be done with it. Let different build systems offer different solutions. Allow for different syntaxes, etc. Leave the core language lean.
Haven't the people making the standards other things to do, like, integrating useful features instead of duplicating incbin.h [0] years after that feature worked?
> The directive is well-specified, currently, in all cases to generate a comma-delimited list of integers.
While a noble act, this is nearly as inefficient as using a code generator tool to convert binary data into intermediate C source. Other routes to embed binary data don't force the compiler to churn through text bloat.
It would be much better if a new keyword were introduced that could let the backend fill in the data at link time.
You should read or re-read the article and references. There are multiple benchmarks showing this not to be the case. Actually half the article is a (well deserved) rant about how wrong compiler devs were in thinking that parsing intermediate C sources could ever match the new directive. Compiler internal representation of an array of integers also doesn't require a big pile of integer ast's.
According to the benchmarking data this extension is even 2x faster than using the linker `objcopy` to insert a binary at link time as you suggest.
The article definitely isn’t a glowing praise of C/C++. In fact, including this simple, useful feature that rust has had for a decade now has taken an immense amount of effort and received so much pushback from various parties, in part due to the strangled mess of various compiler limitations and in part because of design-by-committee stupidity.
Officially, Rust's R-cog logo is the symbol of Rust. It is a registered trademark of the Foundation.
But it's a bit boring. Unofficially, Rust has a mascot, in the form of a crab named "Ferris". The crab mascot appears in lots of places, and the Unicode crab emoji U+1F980 is often used by Rust programmers to indicate Rust in text. Unlike the trademarked logo, you can have a bit of fun with such an unofficial symbol, for example Jon Gjengset's "Rust for Rustaceans" book cover has a stylised crab wearing glasses with a laptop apparently addressing a large number of other crabs.
> vendor extensions ... were now a legal part of the syntax. If your implementation does not support a thing, it can issue a diagnostic for the parameters it does not understand. This was great stuff.
I can’t be the only one who thinks magic comment is already an ugly escape hatch, adding a mini DSL to it that can mean anything to anyone just makes it ten times worse. It’s neither beautiful nor great.
> do extensions on #embed to support different file modes, potentially reading from the network (with a timeout), and other shenanigans.
To be completely honest, I find the fact that this was raised by the committee to be really obtuse and unnecessary. The same "complaint" could be raised about #include as well.
If you want to include data from a continuous stream from a device node, then you could just as easily have the data piped into a temporary file of defined size and then #embed that. No need to have the compiler cater for a problem of your own making.
As for the custom data types. It's a byte array. Why not leave any structure you wish to impose on the byte array up to the user. They can cast it to whatever they like. Not sure why that's anything to do with the #embed functionality.
Both these things seem to be massive overthinking on the part of the committee members. I'm glad I'm not participating, and I really do thank the author for their efforts there. We've needed this for decades, and I'm glad it's got in even if those ridiculous extensions were the compromise needed to get it there.
I’ve known C for close to two decades, thank you. I’m using the not at all well defined term “magic comment” to loosely refer to everything that’s not strictly speaking code but has special meaning, which include pre-processor directives.
Personally I feel the C committee should've disbanded after the first standard (the C++ one, after the 2003 technical corrigendum). I didn't mind C99 much, but it looks like C(++)reeping featuritis is a nasty habit.
These gratuitous standards prompt newbies to use the new features (it's "modern") and puzzled veterans to keep up and reinternalize understanding of new variants of languages they've been using for decades. There's no real improvement, just churn. Possibly it's one of the instruments of ageism. More incompatibility with existing software and existing programmers.
One problem of this new millennium is that the field has developed a tendency to do old things with new languages instead of doing new things with old languages.
Reinventing another polygonal wheel approximation while constantly tweaking the theory used serves to segregate the experienced (who may have trouble accepting such tweaks or their necessity - they already know about the existence of wheels anyway) from the newbies (who have no previous intuitions and no taste for legitimate and spurious novelty).
Newbies are cheap, and new ideas are hard. Let's do some mental rent seeking.
This is a cool feature, but the author doesn't do himself any favors with his style of writing that greatly overestimates the importance of his own feature in the great scheme of things. Remarks like "an extension that should’ve existed 40-50 years ago" make me think, if we should've really bothered all compiler vendors with implementing this 40-50 years ago. After all, you can already a) directly put your binary data in the source file like shown after the preprocessor step and b) read a file at runtime. I'm not saying this isn't useful, but it's a rather niché performance improvement than a core language feature.
I represent Sweden in the ISO WG14, and I voted for the inclusion of Embed in to C23. Its a good feature. But its not a necessary feature and I think JeanHeyd is wrong in his criticism of the pace of wg14 work. I have found everyone in wg14 to be very hardworking and serious about their work.
Cs main strengthen is its portability and simplicity. Therefore we should be very conservative, and not add anything quickly. There are plenty of languages to choose form if you want a "modern" language with lots of conveniences. If you want a truly portable language there is really only C. And when I say truly, I mean for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.
We are the stewards of this, and the work we put in, while large, is tiny compared to the impact we have. Any change we makes, needs to be addressed by every compiler maintainer. There are millions of lines of code that depend on every part of the standard. A 1% performance loss is millions of tons of CO2 released, and billions in added hardware and energy costs.
In this privileged position, we have to be very mindful of the concerns of our users, and take the time too look at every corner case in detail before adding any new features. If we add something, then people will depend on its behavior, no matter how bad, and we therefor will have great difficulty in fixing it in the future without breaking our users work, so we have to get it right the first time.
> for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.
This seems totally misconceived to me as a basis for standardizing a language in 2022. You are optimizing for the few at the expense of the many.
I get that these strange architectures need a language. Why does it have to be C or C++? They can use a nonstandardized variant of C, but why hobble the language that is 99% used on normal hardware with misfeatures that are justified by trule obscure platforms.
It doesn't have to be C, but as of today there is no other option. No one is coming up with new languages with these kinds of features so C it is. People should, but language designers today are more interested in memory safety and clever syntax, than portability.
I would like to caution you against thinking that these weird platforms are old machines from the 60s that only run in museums. For instance many DSPs have 32bit bytes (smallest memory unit that can be individually addressed), so if you have a pair of new fancy noise canceling headphones, then its not unlikely you are wearing a platform like that on your head everyday.
29 replies →
> This seems totally misconceived to me as a basis for standardizing a language in 2022. You are optimizing for the few at the expense of the many.
Sure, but it's the same line of reasoning that made C relevant in the first place, and keeps it relevant today - some library your dad wrote for a PDP-whatever is still usable today on your laptop running Windows 10.
Because it's antiquated, it's also extremely easy to support, and to port to new and/or exotic platforms.
17 replies →
C is pretty much the only language in common use for programming microcontrollers. Microntrollers seldomly have filesystems. To break the language on systems without filesystems or terminals means to break the software of pretty much every electronics manufacturer out there.
9 replies →
As the GP post comments, if you want those features there are plenty of other languages to choose from.
I don’t even like programming in C but I respect what the committee is trying to do, and yes I do sometimes write C code.
3 replies →
I would say that one should be pretty cautious when baking in assumptions snouty such a fleeting thing as hardware into such a lasting thing as a language.
C itself carries a lot of assumptions about computer architecture from the PDP-9 / PDP-11 era, and this does hold current hardware back a bit: see how well the cool nonstandard and fast Cell CPU fared.
A language standard should assume as little about the hardware as possible, while also, ideally, allowing to describe properties of the hardware somehow. C tries hard, but the problem is not easy at all.
2 replies →
It’s worse—-almost all of them already use a nonstandard variant of C. The committee is bending over backwards to accommodate them, but they literally _do not care what the standard says_, so this doesn’t even benefit them. Most will keep using a busted C89 toolchain with a haphazard mix of extensions no matter what the standard does.
1 reply →
This is purely compiler side and usually those esoteric hosts are not running the compiler, being cross compiled but cross compiled, aren't they?
Well and studiously not talking to the few about their actual needs.
This reasoning has always rung mostly hollow for compiler features (#embed, typeof) rather than true language features (VLAs, closures).
Modern toolchains must exist for marginal systems. It's understandable to want to write code for a machine from 1975, or a bespoke MCU, on a modern Thinkpad. It is not necessary to support a modern compiler running on the machine from 1975 / bespoke MCU. You might as well argue against readable diagnostic messages because some system out there might not be able to print them!
I could also see this, though perhaps it's a step too far for C, applying to Unicode encoding of source files.
The 1970s mainframe this program will run on has no idea that Unicode exists. Fine. But, the compiler I'm using, which must have been written in the future after this was standardised, definitely does know that Unicode exists. So let's just agree that the program's source code is always UTF-8 and have done with it.
Jason Turner has a talk where the big reveal is, the reason the slides were all retro-looking was that they were rendered in real time on a Commodore 64. The program to do that was written in modern C++ and obviously can't be compiled on a Commodore 64 but it doesn't need to be, the C64 just needs to run the program.
6 replies →
> And when I say truly, I mean for platforms without file systems
Are we're really talking about compiling on such platforms? And if that's the case, how would #include work but not #embed?
No, I'm mainly talking about targeting. My point is not so much about embed, but rather that, almost anything you assume you think you know about how computers work isn't necessarily true, because C targets such a wide group of platforms. Almost always when some one raises a question along the line of "No platform has ever done that right?", some one knows of a platform that has done that, and it turns out has very good reasons for doing that.
For this reason, everything is much more complicated then you first think. For me joining the WG14 has been an amazing opportunity to learn the depths of the language. C is not big but it is incredibly deep. The answer to "Why does C not just do X?" is almost always far more complicated and thought through than the one thinks.
Everyone in the wg14 who has been around for a while, knows this, and therefore assumes that even the simplest addition will cause problems, even if they cant come up with a reason why.
27 replies →
"""Codify existing practice to address evident deficiencies. Only those concepts that have some prior art should be accepted. (Prior art may come from implementations of languages other than C.) Unless some proposed new feature addresses an evident deficiency that is actually felt by more than a few C programmers, no new inventions should be entertained."""
Source: Rationale for International Standard — Programming Languages — C https://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.1...
I don't know if this rationale is still followed, but I think it applies here. We need to be cautious when adding new features to C.
well, basic string support would be fine, wouldn't it? the C standard still having no proper string library for decades didn't harm its popularity, but still.
you cannot find non-normalized substrings (strings are Unicode nowadays), utf-8 is unsupported. coreutils and almost all tools don't have proper string (=Unicode) support.
> where NULL isn't on address 0
Isn't there literally a single GPU for which it is true?
Asking because everytime this surfaces, someone inevitably asks for an example, and the only example I've seen over the years was of one specific (Nvidia?) GPU that uses NULL of 0xFFFFFFFA (or something similar).
That is, do you know how common it is for NULL to not be 0?
There’s a lot of platforms where you might want to do this. If you’re programming baremetal the “address 0” might be a physical address that you expect stuff to exist at, so it might be relevant to use the bit pattern 0xffffffff instead. If you’re targeting a blockchain or WASM VM you may not also not have memory protection to work with, just a linear array of memory. And some machines don’t even have bit patterns for pointers, like say a Lisp machine.
2 replies →
It's true (in some memory spaces) in AMD GPU too:
https://llvm.org/docs/AMDGPUUsage.html#memory-spaces
1 reply →
Here is an answer that includes a few examples systems from comp.lang.c
https://c-faq.com/null/machexamp.html
People who call C simple have some weird definition of simple. How many C programs contain UB or are pure UB? Probably over 95%+. Language's not simple at all.
A straight razor is simple and that's why it's the easiest to cut yourself with. An electric razor is much safer precisely because much engineering went into its creation.
1 reply →
Thank you for your post!
Thank you especially for reminding everybody that programming is much more than web programming and information systems.
Thank you,
Its also worth remembering that a lot of higher level languages have runtimes / VMs are implemented in C. Web applications rely heavily on databases, java script VM, network-stacks, system calls and operating system features, all of which are impemented in C.
If you are a software developer and want to do something about climate change, consider becomming a compiler engineer. If you manage to get a couple of tenths of a percent performance increase in one of the big compilers during your career, you will have materially impacted global warming. Compiler engineers are the unsung heroes of software engineering.
14 replies →
How would such a platform without file systems handle #include?
Reading further, I don't think this was ever addressed when someone else brought it up. I cannot for the life of me imagine a system where #include works but #embed doesn't. Again, it's fine if some systems have non-standard subsets of the C standard....why hobble the actual standard for code which can be compiled on systems where you have a filesystem (that will handle #include by the way) for the systems without filesystems?
> How would such a platform without file systems handle #include?
I don't think it would, you'd cross-compile for it on a platform with a file system. I think the parent poster's point was that C is the only option for some ultra low resources platforms and that a conservative approach should be taken to add new features in general. I don't think they were saying that specifically that not having a filesystem is problematic for this particular inclusion.
include is with regards to the source platform, not the target platform la, you (generally) need a filesystem to compile, but you don't need a filesystem to run what you compiled
Congratulations! #embed is a very useful feature.
If I may gripe about C for a bit though. I do truly appreciate C's portability. It's possible to target a very diverse set of architectures and operating systems with a single source. Still, I do wish it would actually embrace each architecture, rather than try to mediate between them. A lot of my gripes with C are due to undefined behaviour which is left as such because of platform differences. I've never seen my program become faster if I remove `-fwrapv -fno-strict-aliasing`, but it has resulted in bugs due to compiler optimisations. I really wish by default "undefined behaviour" would become "platform-specific behaviour", with an officially blessed way to tell the compiler it can perform farther optimisations based on data guarantees.
C occupies a very pleasant niche where it lets you write software for the actual hardware, rather than for a VM, while still being high level enough to allow for expressiveness in algorithms and program organisation. I just wish by default every syntactically valid program would also be a well-defined program, because the alternative we have now makes it really hard to reason about and prove program correctness (i.e. that it does what you think it does).
Thanks for your work on the C standard. Any changes that are made will remain forever, so I'm glad the committee takes this seriously.
Seems like a nice addition. Much better than futzing around with xxd and suchlike.
I'm curious what you think of UB from a standard perspective --- were things left undefined and not just implementation-defined because there was simply so much diversity in existing and possibly future implementations that specifying any requirements would be unnecessarily constraining? I can hardly believe that it was done to encourage compiler writers to do crazy nonsensical things without regard for behaving "in a documented manner characteristic of the environment" which seems like the original intent, yet that's what seems to have actually happened.
>I'm curious what you think of UB from a standard perspective
I think a lot about that! I'm a member of the UB study group and the lead author of a Technical Report we hope to release on UB.
In short, "Undefined behavior" is poorly named. It should have been called "Things compilers can assume the program wont do". With what we call "assumed absence of UB" compilers can and do do a lot of clever things.
Until we get the official TR out, you may find I made a video on the subject interesting:
https://www.youtube.com/watch?v=w3_e9vZj7D8
> And when I say truly, I mean for platforms without file systems, or operating systems or where bytes aren't 8 bits, that doesn't use ASCI or Unicode, where NULL isn't on address 0 and so on.
Genuine question: why do we want these platforms to live, rather than to be forced to die? They sound awful.
I understand retrocomputing, legacy mainframes, etc; but 99% of that work is done in non-portable assembler and/or some flavor of BASIC; not in C.
May of these platforms are micro controllers, DSPs or other programmable hardware, that are in every device now a days, so its not retro, its very much current technology.
2 replies →
Because they weren't necessarily awful. Because in the future we may discover we need to do "weird" things again, for performance or other reasons.
Ha I suggested this on the C++ proposals mailing list 7 years ago:
https://groups.google.com/a/isocpp.org/g/std-proposals/c/b6n...
Enjoy the naysayers if you like! I'm glad someone spent the time and effort to push past them. Bit too late for me - I have moved on to Rust which had support for this from version 1.0.0.
> There's also the standard *nix/BSD utility "xxd".
> Seems like the niche is filled. Or, at least, if you want to claim that
> (A) XPM
> (B) incbin
> (C) "xxd -i"
> (D) various ad-hoc scripts given in http://stackoverflow.com/questions/8707183/script-tool-to-co...
>...do NOT completely fill this evolutionary niche
> This ultimately would encourage a weird sort of resource management philosophy that I think might be damaging in the long run.
> Speaking from experience, it is a tremendously bad idea to bake any resource into a binary.
> I'll point out that this is a non-issue for Qt applications that can simply use Qt's resources for this sort of business.
(Though credit to Matthew Woehlke, he did point out a solution which is basically identical to #embed)
> I find this useless specially in embedded environments since there should be some processing of the binary data anyway, either before building the application
In fairness there was a decent amount of support. But given the insane amount of negativity around an obviously useful feature I gave up.
I wonder if there was a similar response to the proposal to include `string::starts_with()`...
> > Speaking from experience, it is a tremendously bad idea to bake any resource into a binary.
What a pompous douche whoever wrote that was.
> > This ultimately would encourage a weird sort of resource management philosophy that I think might be damaging in the long run.
So, this might be a valid point, although not enough to reject the feature for. It true that it's a feature that could potentially see over-use and ab-use. But then, so did templates :-P
> What a pompous douche whoever wrote that was.
And clearly someone who had never once written code for a system without a filesystem.
1 reply →
what is the Rust equivalent for #embed?
https://doc.rust-lang.org/std/macro.include_bytes.html
https://doc.rust-lang.org/std/macro.include_bytes.html
1 reply →
> told me this form was non-ideal and it was worth voting against (and that they’d want the pure, beautiful C++ version only[1])
I heard about #embed, but I didn't hear about std::embed before. After looking at the proposal, to me it does look a lot better than #embed, because reading binary data and converting it to text, only to then convert it to binary again seems needlessly complex and wasteful. I also don't like that it extends the preprocessor, when IMHO the preprocessor should at worst be left as is, and at best be slowly deprecated in favour of features which compose well with C proper.
Going beyond the gut reaction and moving on to hard data, as you can expect from this design, std::embed of course is faster during compilation than #embed for bigger files (comparable for moderately-sized files, and a bit slower for tiny files).
I'm not a huge fan of C++, but the fact that C++ removed trigraphs in C++17 and that it's generally adding features replacing the preprocessor scores a point with me.
[1]: <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p10...>
Compilers follow the "as if" principle, they don't have to literally follow the formal rules given by the standard. They could implement #embed by doing as you say, pretty printing out numbers and then parsing them back in again. But that would be an extremely roundabout way to do it, so I doubt anyone will actually do it that way. Unless you're running the compiler in some kind of debugging mode like GCC's -E.
I don’t think the implication is that the C compiler must encode the binary file as a comma-separated integer list and then re-parse it, only act as if it did so.
How would that work? It would need to depend on the grammar of surrounding C code. This directive isn't limited to variable initialisers. You can use it anywhere. So e.g. you can use it inside structure declaration, or between "int main()" and "{". etc. etc. Those will generate errors in subsequent phases, but during preprocessing the compiler doesn't know about it. Then there is also just that:
There are plenty of cases, where it will all behave differently. And if you're going to pretend even more that the preprocessor understands C syntax, then why not just give this job to compiler proper, which actually understands it?
4 replies →
the preprocessor is a great tool to reduce duplication and boilerplate.
People that don't like it generally just don't know how to use it.
People don't dislike it because they are unaware how helpful it can be. They dislike it because they are aware how hacky, fragile and error-prone it is. They want something more robust than text substitution.
People that don't like it generally have used macros that are more sophisticated than just blindly copy pasting text into your source files and have became aware of how absurd that is.
Or perhaps those people can think of better ways to get those benefits that also don't allow things like
which obliterate tooling such as IDEs. Of course, this is a contrived example, but the preprocessor is just one big footgun, which offers no benefits over other ways of solving the problems you mentioned, such as constexpr and perhaps additional, currently unimplemented solutions.
4 replies →
This serves the same use as Rust's `include_bytes!` macro, right? Presumably most people just use this feature as a way to avoid having to stuff binary data into a massive array literal, but in our case it's essential because we're actually using it to stuff binaries from earlier in our build step into a binary built later in the build step. Not something you often need, but very handy when you do.
This has different affordances than std::include_bytes! but I agree that if you were writing Rust and had this problem you'd reach for std::include_bytes! and probably not instead think "We should have an equivalent of #embed".
include_bytes! gives you a &'static [u8; N] which for non-Rust programmers means we're making a fixed size array (the size of your file) full of unsigned 8-bit integers (ie bytes) which lives for the life of the program, and we get an immutable reference to it. Rust's arrays know how big they are (so we can ask, now or later) but cannot grow.
#embed gets you a bunch of integers. The as-if rule means your compiler is likely to notice if what you're actually doing is putting those integers into an array of unsigned 8-bit integers and just stick all the file bytes in the array, short cutting what you wrote, but you could reasonably do other things, especially with smaller files.
for both Rust and C, these features "just" make something you could otherwise do with the build system and generated code easier, I think.
As the article quotes, in C the lack of standardisation makes this tricky when you want to support more than one compiler, or even when you want to support just one compiler (cf email about the hacks to make it work on GCC with PIE).
> Even among people who control all the cards, they are in many respects fundamentally incapable of imagining a better world or seizing on that opportunity to try and create one, let alone doing so in a timely fashion.
That does sound soul-crushing. Congrats on this achievement!
This is simply wrong. We (the ISO wg14) don't hold the cards, compilers are free to implement what ever they want, users are free to use what ever tools or languages they want.
We exist only as long as we are trusted to be good stewards, and only go forward with the consensus of the wider community.
You're both right.
It's amazing that you and the ISO team are good stewards of the C standard. Thank you for being part of that.
And it can also be true that it was "hell" and "hardly worth it" for the OP to get a new feature added to the language. I believe it was a miserable experience that has him questioning how he spends his time.
Both can be true. Thank you for your efforts. And thank the OP for his efforts too.
> > Even among people who control all the cards, they are in many respects fundamentally incapable of imagining a better world or seizing on that opportunity to try and create one, let alone doing so in a timely fashion.
> This is simply wrong. We (the ISO wg14) don't hold the cards, compilers are free to implement what ever they want, users are free to use what ever tools or languages they want.
This is an incredibly oblivious realization of JeanHeyd's point.
> (the ISO wg14) don't hold the cards
That "standard" card seem to be a pretty huge one though.
1 reply →
I think in our reality the prerequisite for holding all the cards is the lack of competence in knowing how to improve the world. We've gotten where we are now through sheer force of will of those that are empty handed.
The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.
George Bernard Shaw
This reminds me, I'd argue that the explosion of JS frameworks can be mainly blamed on one thing: the lack of an <include src="somemodule.html"> tag. If you have that you basically have 80% of vue.js already natively supported. No clue why this was never added in any fashion. Change my mind.
HTML imports were part of the original concept of Web Components, and I think they were supported in Chrome. If you look up examples of things built with Polymer 1.x, it was used extensively.
It was actually pretty neat, because you could have an HTML file with a template, style, and script section.
Safari rejected the proposal, so it had to get dropped.
But ESM makes it a bit redundant anyway. The end-goal is to allow you to import any kind of asset, not just JS. There have been demos and examples of tools supporting this going back over half a decade at this point.
Firefox refused the proposal as well. ESM requires javascript though. :/
2 replies →
It's funny I read that and I remember Apache's virtual-include facility:
I used that, back in the day, as an alternative to PHP.
Wouldn't the include still need some templating functionality? Or are people using vue that heavily for just importing static html?
Not the parent comment, but my personal use case is for rendering a selectable list. The server side would render a static list with fragment links (ex. `#item-10`) and include elements with corresponding IDs, and a `:target` css rule to unhide the element. This would hopefully be paired with lazy loading the include elements.
edit:
My goal is to avoid reloading the page for each selection and rendering all items eagerly. JS frameworks are the only ones that really allow this behavior.
> https://caniuse.com/imports
It was a feature in Chrome 36-79 and there were working polyfills to make it work on other browsers.
It was actually a great feature and I used it extensively on an old project back then.
CanIUse: https://caniuse.com/imports
(Now obsolete) tutorial: https://www.sitepoint.com/introduction-html-imports-tutorial...
I wonder why there's never been a
Basically what php does but with structure and objects instead of a bytestream
in HTML. Or maybe it's been discussed but got left out
It's always been possible:
<script>document.write(`<p>foo</p>`)</script>
2 replies →
How would <include> be useful for dynamically updating the DOM based on data, which is the main point of Vue?
Is <script type="module" /> not sufficient for your needs? If not then what is missing?
Seems to be arguing for modular layout/templating, which is what virtual includes did (the cgi in the example would hypothetically output html)
Honestly I'm usually very wary of additions to C, as one of its greatest strengths (to me) is how rather straightforward it is as a language in terms of conceptual simplicity. There just aren't that many big concepts to understand in the language. (On the other hand there's _many_ footguns but that's another issue.)
That said, to me this seems like a great addition to the language. It's very single-purpose in its usage (so it doesn't seem to add much conceptual complexity to the language) and it replaces something genuinely painful (arcane linker hacks). I'm very much looking forward to using this as I often make single-executable programs in C. The only thing that's unfortunate is I'm sure it'll take decades before proprietary embedded toolchains add support for this.
C23 and C26 are basically heading into C++ without classes.
There is way too much in C already.
The first commandment of C is: 'writing a naive C compiler should be "reasonable" for a small team or even one individual'. That's getting harder and harder, longer and longer.
I did move from C being "the best compromise" to "the less worse compromise".
I wish we had a "C-like" language, which would kind of be a high-level assembler which: has no integer promotion or implicit casts, has compile-time/runtime casts (without the horrible c++ syntax), has sized primitive types (u64/s64,f32/f64,etc) at its core, has sized literals (42b,12w,123dw,2qw,etc), has no typedef/generic/volatile/restrict/etc well that sort of horrible things, has compile-time and runtime "const"s, and I am forgetting a lot.
From the main issues: the kernel gcc C dialect (roughly speaking, each linux release uses more gcc extensions). Aggressive optimizations can break some code (while programing some hardware for instance).
Maybe I should write assembly, expect RISC-V to be a success, and forget about all of this.
I wish we had something like typed Lua without Lua’s weird quirks (e.g. indexing by 1), designed with performance enhancement and and safety in mind, and with the features you mention.
But like Lua, the base compiler is really small and simple and can be embedded. And it’s “pseudo-interpreted”: ultimately it’s an ahead-of-time language to support things like function declarations after references and proper type checking, but compiling unoptimized is practically instant and you can load new sources at runtime, start a REPL, and do everything else you can with an interpreted language. Now having a simple compiler with all these features may be impossible, so worse-case there is just a simple interpreter, a separate type-checker, and a separate performance-optimized JIT compiler (like Lua and LuaJIT).
Also like Lua and high-level assembly, debugging unoptimized is also really simple and direct. By default, there aren’t optimizations which elide variables, move instructions around, and otherwise clobber the data so the debugger loses information, not even tail-call optimization. Execution is so simple someone will create a reliable record-replay, time-travel debugger which is fast enough you could run it in production, and we can have true in-depth debugging.
Now that i’ve wrote all that I realize this is basically ML. But oCaml still has weird quirks (the object system), SML too honestly, and I doubt their compilers are small and simple enough to be embedded. So maybe a modern ML dialect with a few new features and none of the more confusing things which are in standard ML.
Checkout Nim! It does much of what you describe and its great. The core language is fairly small (not quite lua simple but probably ML comparable). It compiles fast enough that a Nim repl like `inim` is useable to check features and for basic maths, though it requires a C compiler, but TCC [4] works perfectly. Essentially Nim + tcc is pretty close to your description, IMHO. Though I'm not sure TCC supports non-x86 targets.
I've never used it but Nim does support some hot reloading as well [3]. It also has a real VM if you want to run user scripts and has a nice library for it [1]. Its not quite Lua flexible but for a generally compiled language its impressive.
Recently I made a wrapper to embed access to the Nim compilers macros at runtime [2]. It took 3-4 hours probably and still compiles in 10s of seconds despite building in a fair bit of the compiler! It was useful for making a code generator for a serializer format. Though I'm not sure its small enough to live on even beefy m4/m7 microcontrollers. Though I'm tempted to try.
1: https://github.com/beef331/nimscripter 2: https://github.com/elcritch/cdecl/blob/main/src/cdecl/compil... 3: https://nim-lang.org/docs/hcr.html 4: https://bellard.org/tcc/
1 reply →
> I wish we had a "C-like" language, which would... How about https://ziglang.org/ ?
GCC or Clang with all warnings turned on will give you almost what you want. -Wconversion -Wdouble-promotion and 100s of others. A good way to learn about warning flags (apart from reading the docs) is Clang -Weverything, which will give you many, many warnings.
Not an exact match, but a close one: https://odin-lang.org/
I agree (with a lot of caveats), but a key value of C is that we do not break peoples code and that means that we cant easily remove things. If we do, we create a lot of problems. This makes it very difficult to keep the language as easy to implement as we would like. As a member of the WG14, I intend to propose that we do make this our prime priority going forward.
> I wish we had a "C-like" language, which would kind of be a high-level assembler which: has no integer promotion or implicit casts, has compile-time/runtime casts (without the horrible c++ syntax), has sized primitive types (u64/s64,f32/f64,etc) at its core, has sized literals (42b,12w,123dw,2qw,etc), has no typedef/generic/volatile/restrict/etc well that sort of horrible things, has compile-time and runtime "const"s, and I am forgetting a lot.
Unsafe Rust code I think fits this model better than C does: it relies on sized primitive types, it has support for both wrapping and non-wrapping arithmetic rather than C's quite frankly odd rules here, it has no automatic implicit casts, it has no strict aliasing rules.
The first commandment of C is: 'writing a naive C compiler should be "reasonable" for a small team or even one individual'. That's getting harder and harder, longer and longer.
100% agreed. I've always viewed C as a "bootstrappable" language, in which it is relatively straightforward to write a working compiler (in a lower level language, likely Asm) which can then be used to bring up the rest of an environment. The preprocessor is actually a little more difficult in some respects to get completely correct, and arguably #embed belongs there, so it's debatable whether this feature is actually adding complexity to the core language.
Your wish for a "C-like" language sounds very much like B.
Time for a B+ language?
There is so much more to remove: 1 loop statement is enough, loop {}, enum should go away with the likes of typeof, etc.
I wonder if all that makes writing a naive "B+" compiler easier (time/complexity/size) than a plain C compiler. I stay humble since I know removing does not mean easier and faster all the time, the real complexity may be hidden somewhere else.
Are you a programmer? Embed is the easiest feature to implement that I have ever heard
I think the blog post provides some insight into the challenges of implementing this.
3 replies →
I’m really amazed at how divisive this one is, and the number of comments here questioning what seems to me a really useful and well-thought-out feature, something I’d have loved to have used many many times over the years.
I guess the heated arguments here help me understand how it could have taken so long to get this standardised, though, so that’s something!
Congratulations and thank you to the OP for doing this, and thanks also for this really interesting (if depressing) view of the process.
This is a really, really good feature and I am so glad it is finally getting standardized. C23 is shaping up to be a very good revision to the C standard. I’m hoping the proposal to allow redeclaration of identical structs gets in as well as you would finally be able to write code using common types without having to coordinate which would allow interoperability between independently written libraries.
Congratulations to the author. Things like this are why I hope Carbon exists. Evolving c++ seems like a dumpster fire, despite whatever compelling arguments about comparability you are going to drop on me.
The issue is that a lot of people just think about languages in a wrong way, which is the whole reason for pointless things like C++ expansions, Carbon, Rust, and stuff like this.
One of the fundamental ideas that people run with in language creation/expansion is "programmer is stupid and/or make mistakes" -> "lets add language features that intercept and control his stupidity/mistakes".
And there is a very valid reason for this - it allows programmers of lesser skill and knowledge base to pick up codebases and develop safe software, which has economic advantages in being able to higher less experienced devs to write software at lower salary points and spend next to no time fixing segfault issues due to complex memory management. The whole reason Java got so popular over C++ was because of its GC - both C++ and Java supported fairly strong typing with classes, but C++ still had a lot of semantics around memory management that had to be taken care of, whereas with Java you just simply don't do anything.
However, people are applying this idea towards lower level languages, because they want the high performance of a compiled language with a whole bunch of features that make writing code as mistake free as possible. And my challenge to that is this - why not spend the time making just smarter compilers/tooling?
Think about a hypothetical case where Rust gets all the features added to it that people want, and is widely used as the main language over all others. Looking at all the code bases, there will be a lot of common use patterns, a lot of the safety code duplicated over and over in predictable patterns, e.t.c. And you will see these common things added to Rust. Just like with Java a lot of the predictable use patterns got abstracted into widely used libraries like Lombok, Spring, e.t.c, where you don't have to worry about correctness in lieu of using a library. And you essentially will start to move towards more and more stuff being handled for you automagically, which is all part of the compiler/toolchain
In the same way, #embed can be solved by smart compiler. Have a static string that opens a file, and read contents into a buffer that doesn't change? Auto include that file in a binary if you want to target performance rather than executable size. No need for special instruction, just be smart about how you handle an open call, and leave the fine tuning of this to specific compiler options.
And from an economic perspective of ease of use from above, you would have a language like Python which is super easy to pick up and program in, except instead of the interpreter, you would have a compiler that will spit out binaries. Python is already widely adopted primarily of how easy it is to set up and use. Now imagine if you had the option to run a super smart compiler that highlights any potential issues that come with dynamic typing because it understands what you are trying to do, fixes any that it can, and once everything is addressed, it spits out an optimized memory safe executable. With Rust, you code, compile, see you made a mistake somewhere with a reference, fix it, repeat. With this, you would code, compile, fix the mistake somewhere that the compiler warns you about, repeat. No difference.
Focusing on the toolchain also lets you think about integrating features from languages like Coq with provability, where you can focus not only on correctness processing/memory wise, but also "is the output actually correct". I.e, any piece of code for all given input can be specified to have guaranteed bounded output set, which you can integrate into IDE tools to provide you real time feedback on this for you to design the code in a way that avoids things like URL parsing mistakes, which all the languages safety features of Rust won't catch.
As for C, you leave it a version that has a stable, robust ABI, and then anything that you need to support will be delegated to custom tools. That way, in the future where compute will likely be full of specialized ML chips, instead of worrying about writing the frontend to support every feature, you quickly get a notional tool chain made and are able to run existing C code.
Re: #embed </dev/urandom>
Just a random thought, but I'd expect a compiler to do exactly what's described if I tell it:
This would address the most common case with infinity files, and then just let the compiler error out if the array size is not specified.
I would expect that to produce an error, though. If I had a regular file that was not infinite in size, and I specified the wrong length for the array, I would find it more useful to have the compiler inform me as to the discrepancy rather than truncate my file.
A warning, not an error. Both under and over-population can be valid use cases.
Reproducible builds crowd wailing in agony
The context is that of infinite files, not of the urandom specifically. Give the linked post a read for details.
They just need a reproducible urandom.
3 replies →
The preprocessor needs to run before the compiler, though, and isn't complex enough to understand the context of the code that it's in. That would be a substantially complex thing to implement.
This will indeed require delaying population of the array to the compilation stage. However it's worth the convenience and the succinctness of the syntax, and it's not that substantially complex to implement.
Interesting. I look forward to this. What I've been doing now to embed a source.png file is something like this, where I generate source code from a file's data:
in embed_dump.cpp:
Then I set up my makefile like this (main_stuff.cpp #includes embedded_files.h):
What about creating object files from raw binary files and then linking against them? That's what I (and of course many others) do for linking textures and shaders into the program. It's a bit ugly though that with this approach you can't generate custom symbol names, at least with the GNU linker.
This #embed feature might be a nice alternative for small files. Well for large files you usually don't even want to store them inside the binary, so the compilation overhead might be miniscule, since the files are, by intention, small.
When I read the introduction of the article - about allowing us to cram anything we want into the binary - I was hoping to see a standard way to disable optimizations (When the compiler deletes your code and you don't even notice).
One reason against this is mentioned in the letter that is quoted in the article
It depends on your definition of small files. A few hundred kB to a few megabytes will make compilation speed and memory usage explode if you embed it as text, see section 3.2 in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#i...
You reminded me of Bethesda Softworks games, which always seem to have 1GB+ executables for some reason. I hope it isn't all code. Maybe they embed the most important assets that will always need to be loaded.
My guess is that the files are not truly embedded as it would require to load the entire file in memory before running the application, which seems wasteful.
More likely, the actual executable is only a small part of the file which accesses the rest of the file as an archive, like a self-extracting zip. There may also be some DRM trickery going on.
vmprotect and other DRM schemes will also bloat those sizes.
Although I've only seen 1GB+ pds, executables always in the hundreds.
Off the top of my head, I think there's some niche use in embedding shaders so that they don't need to be stored as strings (no IDE support) or read at runtime (slower performance).
There are a lot of use cases for baking binary data directly into the program, especially in embedded applications. For instance, if you are writing a bootloader for a device that has some kind of display you might want to include a splash screen, or a font to be able to show error messages before a filesystem or an external storage medium is initialized. Similarly, on a microcontroller with no external storage at all you need to embed all your assets into the binary; the current way to do that is to either use whatever non-standard tools the manufacturer's proprietary toolchain provides, or to use xxd to (inefficiently) generate a huge C source file from the contents of the binary file. Both require custom build steps and neither is ideal.
Another typical use is embedding a public-key in an application or firmware.
You can get some IDE support with a simple preprocessor macro[1].
It's a crutch, but at least you don't need to stuff the shader into multiple "strings" or have string continuations (\) at the end of every line. Plus you get some syntax highlighting from the embedding language. I.e. the shader is highlighted as C code, which for the most part seems to be close enough.
[1] https://github.com/phoboslab/pl_mpeg/blob/master/pl_mpeg_pla...
That's pretty clever
Thanks, I'll remember to use that in my future shaders
Nice for binary shaders too, e.g. SPIR-V bytecode generated by glslc.
Other stuff:
https://twitter.com/rcs/status/1550526425211584512
nullptr! auto! constexpr!
Not sure about the value of nullptr! Also not sure about auto! In C.
nullptr since we have type detection now, and NULL mustn't be a pointer. auto, because otherwise everybody would create their own hacky auto using the new typeof.
If you want to start playing with this now, my C preprocessor Cedro (https://sentido-labs.com/en/library/
Looks great. I've been writing cmake hacks to include assets in executables for too long.
Can’t you just add binary data into a custom section of your ELF executable?
Yes, but it's linker-specific and non-portable. It can also come with some annoying limitations, like having to separately provide the data size of each symbol. In some cases this might be introspectable, but again comes at the expense of portability.
ELF-based variants of the IAR toolchain, for example, provide a means of directly embedding a file as an ELF symbol, but without the size information being directly accessible.
GNU ld and LLVM lld do not provide any embedding functionality at all (as far as I can see). You would have to generate a custom object file with some generated C or ASM encoding the binary content.
MSVC link.exe doesn't support this either, but there is the "resource compiler" to embed binary bits and link them in so they can be retrieved at runtime.
Having a universal and portable mechanism which works everywhere will be a great benefit. I'll be using it for compiled or text shaders, compiled or text lua scripts, small graphics, fonts and all sorts.
This article[1] shows how you can use GCC toolchain along with objcopy to create an object file from a binary blob, link it, and use the data within in your own code.
[1] https://balau82.wordpress.com/2012/02/19/linking-a-binary-bl...
1 reply →
The article address this directly. If you're only targeting one platform then this is reasonably easy (albeit still not as easy as #embed), but if you need to be portable then it becomes a nightmare of multiple proprietary methods.
You probably don't realize that not every system is using ELF binaries....
Sure, but to add binary data to any executable on any platform is more involved.
As an example, see [1]. That will turn any file into a C file with a C array, and I use it to embed a math library ([2]) into the executable so that the executable does not have to depend on an external file.
[1]: https://git.yzena.com/gavin/bc/src/branch/master/gen/strgen....
[2]: https://git.yzena.com/gavin/bc/src/branch/master/gen/lib.bc
Then you don’t get regular optimizations like deduping identical declarations.
Or source line location debug info, though nobody tries to show that for data at the moment.
The less you have to mess with linker scripts, the better.
5 years ago I wrote a small python script [1] to help me solve "the same problem". It reads files in a folder and generates an header file containing the files' data and filenames. Is very simple and was to helping me on a job. It has limitations, don't be too hard on me :)
[1] https://github.com/daxliar/pyker
This will simplify a lot of build pipelines for sure.
One thing that isn't clear from skimming the article, how do you refer to the embedded data again?
> The directive is well-specified, currently, in all cases to generate a comma-delimited list of integers
I.e. you most likely use it to initialize a static variable, and then refer to that variable.
Ah so this basically?
EDIT: ah it was showing up more like a comment which made it hard to spot.
1 reply →
Everlasting glory to JeanHeyd Meneide, @thephantomderp, for getting this feature into C.
I am wondering, though - where does this stand in C++?
This is a cool feature and I'll likely be using it in the years to come. However, the posix standard command xxd and its -i option can achieve this capability portably today.
It will be useful to achieve it directly in the preprocessor however. I wonder how quickly can it be added to cpp?
I'm pretty sure xxd is not part of POSIX https://pubs.opengroup.org/onlinepubs/9699919799/idx/utiliti...
I've always used xxd -i for embedding, doesn't have the mentioned problems and works everywhere, as it simply outputs a header file with byte array.
Well, congratulations, you now have a build dependency on Vim. (xxd is not a standard tool, it ships with Vim.)
It's also only suitable for tiny files: compile time and RAM requirements will blow up once you go beyond a couple of megabytes.
> It's also only suitable for tiny files: compile time and RAM requirements will blow up once you go beyond a couple of megabytes.
Do you know what makes it so? Is there a technical argument why the compiler could do better, except maybe for xxd not being specifically optimized for this use case?
2 replies →
Yeah, these are reasonable arguments against it
2 replies →
The article spends a fair bit of time discussing the build speed and memory use problems with that approach. Like, the benchmark results [0] linked to from this post literally have xxd as one of the rows. It's not a viable option for embedding megabytes of data.
[0] https://thephd.dev/embed-the-details#results
And even if the data is small enough, not every C programmer uses Unix or knows their way around it.
But you have to have build system stuff for that and it's obviously non portable.
True, i dont personally ever have problem with this because i always compile from unix system anyways. (Even for windows)
It is something that I had wanted in C too, for a while, so I am glad that they added this #embed command.
How to read unsigned data? Is there a stadardized parameter, or does this require a vendor extension?
You just make your array type uint8_t or whatever you need as long as it supports integer literals. See section 4 in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#i...
Scary, it's as if the preprocessor has become type-aware. I guess I better don't imagine the result of the preprocessing to look similar to and following the same rules as something I would have written by hand. This might make manual inspection of the preprocessed file a bit painful.
1 reply →
What situations is this useful for?
One particular scenario that people have highlighted is developing for an embedded system that doesn't have any storage except flash memory, and no filesystem. In this kind of system, embedding static resources in the executable is the only reasonable option you have.
> “Touch grass”, some people liked to tell me. “Go outside”, they said (like I would in the middle of Yet Another Gotdang COVID-19 Spike). My dude, I’ve literally gotten company snail mail, and it wasn’t a legal notice or some shenanigans like that! Holy cow, real paper in a real envelope, shipped through German Speed Mail!! This letter alone probably increases my Boomer Cred™ by at least 50; who needs Outside anymore after something like this?
Touch grass indeed. Sure, #embed is a nice feature, but this self-indulgent writing style I can’t stand.
Maidenless commenter.
This is wrong.
It belongs in the linker and you can pull the symbol it creates in with extern. I’ve been doing this for about 25 years.
...and now your solution is non-portable and as a cross-platform developer you need to implement N different build scripts. This is far more elegant.
No C toolchain is portable!
If that's a problem, use Go or another higher level language.
6 replies →
C89 is where C should've stayed at. If you need to convert a file to a buffer and stick that somewhere in your translation unit, use a build system. Don't fuck with C.
Nothing stops you from sticking to C89 if that's you want. Many projects do, and the -std=c89 option will not disappear anytime soon.
Did you read the snail mail letter from someone who does just that?
> "Did you read the snail mail letter from someone who does just that?"
I did. The author struggled embedding files into their executables with makefiles. We don't know anything else beyond that. So what?
People also struggle with memory management in C, an arguably much more difficult and widespread problem. Should we introduce a garbage collector into the C spec? How about we just pull in libsodium into the C standard library because people struggle with getting cryptography right?
OP mentions #embed was a multi-year long uphill battle, with a lot of convincing needed at every turn. That in itself is enough proof that people aren't in clear agreement over there being a single "right" solution. Hence, leave this task to bespoke build systems and be done with it. Let different build systems offer different solutions. Allow for different syntaxes, etc. Leave the core language lean.
It takes literally 5 minutes to write a python script that does this.
It took a long time to get this adopted because people are most likely busy with things that cannot be already solved trivially.
I think it's nice that this will soon be possible to do without adding Python as a dependency in your build system
> trivially
The article covers quite a few reasons why the way things are done without #embed are not quite as trivial as they seem.
I've been doing it for 20 years without any single issue, on fairly large files.
This proposal doesn't even allow to compress or encrypt the data.
1 reply →
Haven't the people making the standards other things to do, like, integrating useful features instead of duplicating incbin.h [0] years after that feature worked?
https://github.com/graphitemaster/incbin/blob/main/incbin.h
That doesn't work on MSVC without an external source-generating tool.
> The directive is well-specified, currently, in all cases to generate a comma-delimited list of integers.
While a noble act, this is nearly as inefficient as using a code generator tool to convert binary data into intermediate C source. Other routes to embed binary data don't force the compiler to churn through text bloat.
It would be much better if a new keyword were introduced that could let the backend fill in the data at link time.
You should read or re-read the article and references. There are multiple benchmarks showing this not to be the case. Actually half the article is a (well deserved) rant about how wrong compiler devs were in thinking that parsing intermediate C sources could ever match the new directive. Compiler internal representation of an array of integers also doesn't require a big pile of integer ast's.
According to the benchmarking data this extension is even 2x faster than using the linker `objcopy` to insert a binary at link time as you suggest.
C++ keeps kicking ass!
Feel sorry for crab people.
The article definitely isn’t a glowing praise of C/C++. In fact, including this simple, useful feature that rust has had for a decade now has taken an immense amount of effort and received so much pushback from various parties, in part due to the strangled mess of various compiler limitations and in part because of design-by-committee stupidity.
C/C++ seems to be kicking it’s own ass.
Not to mention that it didn't even get into C++
The article doesn't even mention C++/Java.
1 reply →
That's not really the right conclusion to draw from this article
"crab people" means Rust people?
Officially, Rust's R-cog logo is the symbol of Rust. It is a registered trademark of the Foundation.
But it's a bit boring. Unofficially, Rust has a mascot, in the form of a crab named "Ferris". The crab mascot appears in lots of places, and the Unicode crab emoji U+1F980 is often used by Rust programmers to indicate Rust in text. Unlike the trademarked logo, you can have a bit of fun with such an unofficial symbol, for example Jon Gjengset's "Rust for Rustaceans" book cover has a stylised crab wearing glasses with a laptop apparently addressing a large number of other crabs.
1 reply →
> vendor extensions ... were now a legal part of the syntax. If your implementation does not support a thing, it can issue a diagnostic for the parameters it does not understand. This was great stuff.
I can’t be the only one who thinks magic comment is already an ugly escape hatch, adding a mini DSL to it that can mean anything to anyone just makes it ten times worse. It’s neither beautiful nor great.
> do extensions on #embed to support different file modes, potentially reading from the network (with a timeout), and other shenanigans.
(Emphasis mine.) My god.
To be completely honest, I find the fact that this was raised by the committee to be really obtuse and unnecessary. The same "complaint" could be raised about #include as well.
If you want to include data from a continuous stream from a device node, then you could just as easily have the data piped into a temporary file of defined size and then #embed that. No need to have the compiler cater for a problem of your own making.
As for the custom data types. It's a byte array. Why not leave any structure you wish to impose on the byte array up to the user. They can cast it to whatever they like. Not sure why that's anything to do with the #embed functionality.
Both these things seem to be massive overthinking on the part of the committee members. I'm glad I'm not participating, and I really do thank the author for their efforts there. We've needed this for decades, and I'm glad it's got in even if those ridiculous extensions were the compromise needed to get it there.
I guess you don't know C.
"#" is not a symbol for a comment line but the one for a pre-processor directive. Like #include stdlib.h
In c/c++ you use // and /* */ for comments.
I’ve known C for close to two decades, thank you. I’m using the not at all well defined term “magic comment” to loosely refer to everything that’s not strictly speaking code but has special meaning, which include pre-processor directives.
cpp is definitely a well-hated part of C.
6 replies →
> (Emphasis mine.) My god.
yes, C finally catching up with what languages such as F# have been able to do for years with great success https://docs.microsoft.com/en-us/dotnet/fsharp/tutorials/typ... ; wild isn't it to step into the 2010-era of programming ?
I suppose you’re of the opinion that every feature of every language should be added to C, or maybe even assembly.
5 replies →
Personally I feel the C committee should've disbanded after the first standard (the C++ one, after the 2003 technical corrigendum). I didn't mind C99 much, but it looks like C(++)reeping featuritis is a nasty habit.
These gratuitous standards prompt newbies to use the new features (it's "modern") and puzzled veterans to keep up and reinternalize understanding of new variants of languages they've been using for decades. There's no real improvement, just churn. Possibly it's one of the instruments of ageism. More incompatibility with existing software and existing programmers.
Having to learn new things throughout your career isn't ageism.
The question is: which other things, and why?
One problem of this new millennium is that the field has developed a tendency to do old things with new languages instead of doing new things with old languages.
Reinventing another polygonal wheel approximation while constantly tweaking the theory used serves to segregate the experienced (who may have trouble accepting such tweaks or their necessity - they already know about the existence of wheels anyway) from the newbies (who have no previous intuitions and no taste for legitimate and spurious novelty).
Newbies are cheap, and new ideas are hard. Let's do some mental rent seeking.
This is a cool feature, but the author doesn't do himself any favors with his style of writing that greatly overestimates the importance of his own feature in the great scheme of things. Remarks like "an extension that should’ve existed 40-50 years ago" make me think, if we should've really bothered all compiler vendors with implementing this 40-50 years ago. After all, you can already a) directly put your binary data in the source file like shown after the preprocessor step and b) read a file at runtime. I'm not saying this isn't useful, but it's a rather niché performance improvement than a core language feature.