> and I end up having all these typedefs in my projects
I avoid doing this now. It's more trouble than it's worth and it changes your code from a standard dialect of C into a custom one. Plus my eyes are old and they don't enjoy separating short identifiers.
> typedef struct { ... } String
I avoid doing this. Just use `struct string { ... };'. It makes it clear what you're handling. C23 finally gave us "auto", you shouldn't fret over typedefing everything anymore. I also prefer a "strbuf" type with an index and capacity so I can safely read and write to it with a derived "strview" having pointer and length only which references into the buffer.
> returning results
The general method of returning structures larger than two machine words is fairly inefficient. Plus you're cutting yourself off from another C23 gem which was [[nodiscard]]. If you want the 'ok' value checked then you can _really_ specify that. Put everything else behind a pointer passed in an argument. The sum type logic works just as well there.
> I tend to avoid the string.h functions most of the time, only employing the mem family when I want to, well, mess with memory.
So you use strlen() a lot and don't have to deal with multibyte characters anywhere in your code. It's not much of a strategy.
I had a coworker who had a very complicated set of "includes" that their code relied upon—not unlike the typedefs in the post. So his code was difficult to move around without also moving all his headers with it.
I try to minimize dependencies (custom headers, custom macros, etc.).
> So you use strlen() a lot and don't have to deal with multibyte characters anywhere in your code. It's not much of a strategy.
You don't need to support all multibyte encodings (i.e. DBCS, UCS-2, UCS-4, UTF-16 or UTF-32) characters if you're able to normalise all input to UTF-8.
I think, when you are building a system, restricting all (human language) input to be UTF-8 is a fair and reasonable design decision, and then you can use strlen to your hearts content.
Am I missing something here? UTF8 has multibyte characters, they're just spread across multiple bytes.
When you strlen() a UTF8 string, you don't get the length of the string, but instead the size in bytes.
Same with indices. If you Index at [1] in a string with a flag emoji, you don't get a valid UTF8 code point, but instead some part of the flag emoji. This applies with any UTF8 code points larger than 1 byte, which there are a lot of.
> I think, when you are building a system, restricting all (human language) input to be UTF-8 is a fair and reasonable design decision, and then you can use strlen to your hearts content.
It makes no sense. If you only need the byte count then you can use strlen no matter what the encoding is. If you need any other kind of counting then you don't use strlen no matter what the encoding is (except in ASCII only environment).
"Whether I should use strlen or not" is a completely independent question to "whether my input is all UTF-8."
> I avoid doing this. Just use `struct string { ... };'. It makes it clear what you're handling.
Well then imagine if Gtk made you write `struct GtkLabel`, etc. and you saw hundreds of `struct` on the screen taking up space in heavy UI code. Sometimes abstractions are worthwhile.
> Well then imagine if Gtk made you write `struct GtkLabel`, etc. and you saw hundreds of `struct` on the screen taking up space in heavy UI code. Sometimes abstractions are worthwhile.
TBH, in that case the GtkLabel (and, indeed, the entire widget hierarchy) should be opaque pointers anyway.
If you're not using a struct as an abstraction, then don't typedef it. If you are, then hide the damn fields.
Thank you! Because I wanted to point exactly that. When I was very junior programmer, and coded alone, I used to have “that elemental header” where lots of things were inside. Many of them to convert C in what I wished it was.
Now I think is between no good idea, and absolutely awful.
Yes, sometimes you wish some thing were different in a programming language “if only these types had shorter names”. But when you work in a team, first you should have consensus, and then modifying the language becomes a heavy load, that every new person in the project will have to lift.
“Modifying C is porting the Lisp curse to C” is my motto. Use all as standard, vanilla as possible.
> I’ve long been employing the length+data string struct. If there was one thing I could go back and time to change about the C language, it would be removal of the null-terminated string.
It's not necessary to go back in time. I proposed a way to do it in modern C - no existing code would break:
> the fatal error was not combining the array dimension with the array pointer; all it needs is a little new syntax a[...]; this won’t fix any existing code. Over time, the syntax a[] can be deprecated by convention and by compilers.
You're thinking in decades. C standard committee is slower than that. This could have worked in practice, but probably never will happen in practice. Maybe people should start considering a language like D[1] as an alternative, which seems to have the spirit of both C and Go, but with much more pragmatism than either.
Meanwhile after UNIX was done at AT&T, the C language authors hardly cared for the C standard committee in regards to the C compiler supported features used in Plan 9 and Inferno, being only "mostly" compatible, followed up having a authoring role in Alef, Limbo and Go.
> The language accepted by the compilers is the core ANSI C language with some modest extensions, a greatly simplified preprocessor, a smaller library that includes system calls and related facilities, and a completely different structure for include files.
The C committee is not afraid to add new syntax. And this is an easy addition.
Not only does it deliver a massive safety improvement, it dramatically speeds up strlen, strcmp, strcpy, strcat, etc. And you can pick out a substring without needing to allocate/copy. It's easy money.
As I see it, the problem with languages trying to replace C is that they not only try to fix fundamental flaws, but feel compelled to add unneeded features and break C's simplicity.
Even simpler, you can do something like this to have length-delimited AND null-terminated strings (written from memory, no guarantees of correctness etc.):
Please don’t buy into “no const”. If you’ve ever worked with a lot of C/C++ code, you really appreciate proper const usage and it’s very obvious if a prototype is written incorrectly because now any callers will have errors. No serious reusable library would expose functions taking char* without proper const usage. You would never be able to pass a C++ string c_str() to such a C function without a const_cast if that were the case. Casting away const is and should be an immediate smell.
I'm a huge fan of the 'parse, don't validate' idiom, but it feels like a bit of a hurdle to use it in C - in order to really encapsulate and avoid errors, you'd need to use opaque pointers to hidden types, which requires the use of malloc (or an object pool per-type or some other scaffolding, that would get quite repetitive after a while, but I digress).
You basically have to trade performance for correctness, whereas in a language like C++, that's the whole purpose of the constructor, which works for all kinds of memory: auto, static, dynamic, whatever.
In C, to initialize a struct without dynamic memory, you could always do the following:
struct Name {
const char *name;
};
int parse_name(const char *name, struct Name *ret) {
if(name) {
ret->name = name;
return 1;
} else {
return 0;
}
}
//in user code, *hopefully*...
struct Name myname;
parse_name("mothfuzz", &myname);
But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever. This is very close to 'validation' type behaviour. So to get real 'parsing' behaviour, dynamic memory is required, which is off-limits for many of the kinds of projects one would use C for in the first place.
I'm very curious as to how the author resolves this, given that they say they don't use dynamic memory often. Maybe there's something I missed while reading.
> But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever
This is nothing new in C. This problem has always existed by virtue of all struct members being public. Generally, programmers know to search the header file / documentation for constructor functions, instead of doing raw struct instantiation. Don‘t underestimate how good documentation can drive correct programming choices.
C++ is worse in this regard, as constructors don‘t really allow this pattern, since they can‘t return a None / false. The alternative is to throw an exception, which requires a runtime similar to malloc.
In C++ you would have a protected constructor and related friend utility class to do the parsing, returning any error code, and constructing the thing, populating an optional, shared_ptr, whatever… don’t make constructors fallible.
> In the absence of proper language support, “sum types” are just structs with discipline.
With enough compiler support they could be more than that. For example, I submitted a tagged union analysis feature request to gcc and clang, and someone generalized it into a guard builtin.
With proper discipline, one can even program a Turing machine directly. The problems are two: (1) Doing so is very slow and arduous, and (2) a chance of making a dangerous error is still quite high.
For instance, it appears that no amount of proper discipline, even in the best developers, allows to replace proper array support with a naked pointer to a memory area.
The compiler's job is to program the turing machine for us. It should help as much as possible. For example, I really like using enums because compilers have extensive support for checking that all values have been handled in switch statements.
I don't like it when compilers start getting in the way though. We use C because we want to do raw things like point a structure at some memory area in order to access the data stored there. The compiler's job is to generate the expected code without screwing it up by "optimizing" it beyond recognition because of strict aliasing or some other nonsense.
you can certainly wrap the array with a structure which provides either bounds information to be checked with generic runtime functions, or specific function pointers (methods) to get and set.
you can paper over _alot_ of Cs faults. ultimately its not really worth it, but its not nearly as fragile and arduous as you make it out to be
FWIW, Coverity (maybe others) has a checker that creates an error if it detects tagged union access without first checking the tag. It’s not as strict as enforcing which fields belong to which tag values, but it can still be useful. I’d much rather have what was proposed in the GCC bug!
If you really insist on not having a distinction between "u8"/"i8" and "unsigned char"/"signed char", and you've gone to the trouble of refusing to accept CHAR_BIT!=8, I'm pretty sure it'd be safer to typedef unsigned char u8 and typedef signed char i8. uint8_t/int8_t are not necessarily character types (see 6.2.5.20 and 7.22.1.1) and there are ramifications (see, e.g., 6.2.6.1, 6.3.2.3, 6.5.1).
With the disclaimer that I let my language lawyer qualification lapse a while ago, it's broadly to do with the character types being the only approved way to examine the bytes of an object. An object of a type can be accessed only as if it were an object of that type or some compatible type, but: it can also be accessed as a sequence of characters. (You'd do this if implementing memcpy, memset or memcmp, for example.)
6.2.6.1 - only character types can be used to inspect the sequence of bytes making up an objuect, and (interestingly) only an array of unsigned char is suitable for memcpy'ing an object into for inspection. It's possible for sequences of bytes to exist that don't represent a valid value of the original object; it's undefined behaviour to read those sequences of bytes other than via a character type (i.e., I think, via a pointer to something compatible with the object's actual type - there being no other valid ways to even attempt to read it)
6.3.2.3 - when casting a pointer to an object type to a pointer to a character type, the new character pointer points to the bytes of the object. If converting between object types, on the other hand, the original pointer will (with care) round trip, and that seems to be all you can do.
6.5.1 - as well as all the expected ways of accessing an object, objects can be accessed via a character pointer
> and you've gone to the trouble of refusing to accept CHAR_BIT!=8
This one was a head-scratcher for me. Yeah, there's no cost to check for it, but architectures where CHAR_BIT != 8 are rarer even than 24-bit architectures.
I got the impression the author was implying because CHAR_BIT is enforced to be 8 that uint8_t and char are therefore equivalent, but they are different types with very different rules.
E.g. `char p = (char )&astruct` may violate strict aliasing but `uint8_t p = (uint8_t )&astruct` is guaranteed legal. Then modulo, traps, padding, overflow, promotion, etc.
> I don’t personally do things that require dynamic memory management in C often, so I don’t have many practices for it. I know that wellons & co. Have been really liking the arena, and I’d probably like it too if I actually used the heap often. But I don’t, so I have nothing to say.
> If I find myself needing a bunch of dynamic memory allocations and lifetime management, I will simply start using another language–usually rust or C#.
I'm not sure what the modern standards are, but if you are writing in C, pre-allocate as much as possible. Any kind of garbage collection is just extra processing time and ideally you don't want to run out of memory during an allocation mid-execution.
People may frown at C, but nothing beats getting your inner loops into CPU cache. If you can avoid extra fetches into RAM, you can really crank some processing power. Example projects have included computer vision, servers a custom neural network - all of which had no business being so fast.
Solid list. The bit about avoiding the preprocessor as much as possible really resonates—using `static inline` functions and `enum` instead of macros makes debugging so much less painful. What's your take on using C11's `_Generic` for type-generic macros? It adds some verbosity but can save you from a lot of runtime type errors.
Regarding memory, I recently changed to try to not use dynamic memory, or if I need to, to do it once at startup. Often static memory on startup is sufficient.
Instead use the stack much more and have a limit on how much data the program can handle fixed on startup. It adds the need to think what happens if your system runs out of memory.
Like OP said, it's not a solution for all types of programs. But it makes for very stable software with known and easily tested error states. Also adds a bit of fun in figuring out how to do it.
As someone who spent most of their career as an embedded dev, yes, this is fine for (like parent said) some types of software.
Even for places where you'd think this is a bad idea, it's still can be a good approach, for example allocating and mapping all memory up to the limit you are designing. Honestly this is how engineering is done - you have specified limits in the design, and you work explicitly to those limits.
So "allocate everything at startup" need not be "allocate everything at program startup", it can be "allocate everything at workflow startup", where "workflow" can be a thread, a long-running input-directed sequence of functions, etc.
For example, I am starting a tiny stripped down web-server for a project, and my approach is going to be a single 4Kb[1] block for each request, allocated via a pool (which can expand on pressure up to some maximum) and returned to the pool once the response is sent.
The 4Kb includes at most 14 headers (regardless of each headers size) with the remaining data for the JSON payload. The JSON payload is limited to at most 10 fields. This makes parsing everything "allocate-less" because the array holding pointers to the keys+values of the header is `const char *headers[14]` and to the payload JSON data `const char *fields[10]`.
A request that doesn't fit in any of that will be rejected. This means that everything is simple and the allocation for each request happens once at startup (pool creation) even while parsing the input.
I'm toying with the idea of doing the same for responses too, instead of writing it out as and when the output is determined during the servicing of the request.
-------------------------
[1] I might switch to 6Kb or 8Kb if requests need more; whatever number is chosen, it's going to be a static number.
Dynamic memory allocation solves the problem of dynamic business requirements.
If you know your requirements up front, static memory initialisation is the way.
For instance, indexing a typed array with an enum is no different then an unordered map of string to int, IF you have all your business requirements up front
In recent years I had to write some firmware code with C and that was exactly the approach I took. So far I never had need for any dynamic memory and I was surprised how far I can get without it.
This is the way. Allocate all memory upfront. Create an allocator if you need to divy it up dynamically. Acquire all resources up front. Try to fit everything in stack. Much easier that way.
Only allocate on the heap if you absolutely have to.
I've been looking into Ada recently and it has cool safety mechanisms to encourage this same kind of thing. It even allows you to dynamically allocate on the stack for many cases.
I have some firmware that runs an event loop. There is no malloc anywhere. But I do have an area which gets reset event handler after each call. Useful for passing objects up the call stack.
One other thing I tend to do anything that needs to live longer than the current call stack gets copied into a queue of some sort. I feel it's kinda doing manually what rusts borrow checker tries to enforce.
Not distracting at all, it feels nostalgic to me. Id rather have these flashy things than a million popups and registration forms following you around, which is basically the modern web. I hate it so much. This site is pure balsam for my soul.
Two things I thought while reading the post:
Why not typedef BitInt types for stricter size and accidental promotion control when typedeffing for easier names anyway?
I came across a post mentioning using regular arrays instead of strings to avoid the null terminatorand off-by-one pitfalls.
I still have a lot of conversion to do before I can try this in my hobby project, but these are interesting ideas.
Even if the code might not end up requiring it, if you write it with the assumption that bytes are 8 bits, it's good to document that with a static assert so someone porting things knows there will be dragons
It's a pretty neat way to drop some corner cases from your mental load without building subtle traps
> Additionally, the intent of whether the buffer is used as “raw” memory chunks versus a meaningful u8 is pretty clear from the code that it gets used in, so I’m not worried about confusing intent with it.
It's generally not clear to the compiler, and that can result in missed optimization opportunities.
I really dislike parsing not validating as general advice. IMO this is the true differentiator of type systems that most people should be familiar with instead of "dynamic vs static" or "strong vs weak".
Adding complexity to your type system and to the representation of types within your code has a cost in terms of mental overhead. It's become trendy to have this mental model where the cost of "type safety" is paid in keystrokes but pays for itself in reducing mental overhead for the developers. But in reality you're trading one kind of mental overhead for another, the cost you pay to implement it is extra.
It's like "what are all the ways I could use this wrong" vs "what are all the possibilities that exist". There's no difference in mental overhead between between having one tool you can use in 500 ways or 500 tools you can use in 1 way, either way you need to know 500 things, so the difference lies elsewhere. The effort and keystrokes that you use to add type safety can only ever increase the complexity of your project.
If you're going to pay for it, that complexity has to be worth it. Every single project should be making a conscious decision about this on day one. For the cost to be worth it, the rate of iteration has to be low enough and the cost of runtime bugs has to be high enough. Paying the cost is a no brainer on a banking system, spacecraft or low level library depended on by a million developers.
Where I think we've lost the plot is that NOT paying the cost should be a no brainer for stuff like front end web development and video games where there's basically zero cost in small bugs. Typescript is a huge fuck up on the front end, and C++ is a 30 year fuck up in the games industry. Javascript and C have problems and aren't the right languages for those respective jobs, but we completely missed the point of why they got popular and didn't learn anything from it, and we haven't created the right languages yet for either of those two fields.
Same concept and cost/benefit analysis applies to all forms of testing, and formal verification too.
While I broadly agree with your general point, in that engineering is making a set of trade-offs, I don't necessarily agree that ditching type-safety in the example contexts you posted is the appropriate trade-off.[1]
I'll ditch type-safety in experimental/exploratory code; I'll use Lisp (or, more recently, Python) to test if something is a good idea. For anything that ships to production, I think a basic level of type enforcement is necessary, even if you don't want the whole type zoo.
For your Javascript f/end context, I like the proposed TC39 approach (https://github.com/tc39/proposal-type-annotations?tab=readme...). The typing is optional, does not break existing syntax and can still be used to enforce a basic level of type safety if the developer wants it.
----------------------------
[1] I upvoted you anyway. Your broader point is still valid.
I'm not talking about ditching type safety. I'm saying the whole concept of "safe" and "unsafe" as most people on HN understand it is flawed. The interesting part of a type system isn't whether the compiler checks types or if we just go lmao fuck it let's not even bother, it's whether or not you need to represent the types in your code in order for the compiler to check them. For the majority of what people want from type safety in a language like Javascript, the answer is that no, you don't need to, as long as you're willing to not have every single language feature under the sun.
With compiled languages you can statically infer a ton of type information without having to pepper your codebase with repeated references to what something is. Nominal typing essentially boils down to a double-check of your work, you specify the type separately and then purposely assign it to a variable, so that if you make a mistake with either part the compiler picks it up.
But those kinds of double-checks can be done for almost anything (outside of dynamic boundaries like io/dlls) without nominal type signatures in the code, as long as you jettison the ability to change types at runtime. No language as far as I can tell actually does this because we're all so obsessed with the false dichotomy of nominal and dynamic typing.
In JS everyone likes to use string unions in place of enums so let's use that as an example. If you have something that is only ever set as "foo" or "bar", that's effectively a boolean. If you receive that string in another function, make a typo and write if (str == "boo"), then in every single language I'm aware of that passes a compiler check. But it shouldn't, because the compiler has all the information it needs to statically catch that error and fail the build. The set of assignments to that variable and the set of equality checks on it provide the two parts of the double-check.
In a perfect world we'd have 10 of these "middle of the road" strongly typed static languages to choose from that all optimise for minimal type representation in their own unique way. But every time I see one of these projects pop up on HN it gets like 10 comments then disappears into the sunset because the programming community is so enraptured with the nominal type system of C and all the fucking bullshit Bjarne Stroustrup pasted on top of it 40 years ago. So we end up with this silly situation where the only things considered "safe" by the crowd are strict descendants of C/C++ with the array/pointer/string screw-ups that made those languages unsafe removed.
> I think one of the most eye-opening blog posts I read when getting into programming initially was the evergreen parse, don’t validate post
Bro, that was written in 2019. If it's not old enough to drink it's not yet evergreen. But it's also long-winded. A 25-minute read, and y'know what the conclusion is? "Parsing leaves you with a new data structure matching a type, validation checks if some data technically complies with a type (but might not later be parsed correctly)".
I need all the baby programmers in the back to hear me: type systems are bikeshedding. The point of a type is only to restrict computation to a fixed set. This concept can be applied anywhere you need to ensure reliability and simplicity. You don't need a programming language to natively support types in order to implement the concept yourself in that language.
> You don't need a programming language to natively support types in order to implement the concept yourself in that language.
In a programming language that doesn't enforce types, how do you implement
> "Parsing leaves you with a new data structure matching a type, validation checks if some data technically complies with a type (but might not later be parsed correctly)".
It is like those folks that rather write JSDoc comments than using a linter like Typescript, because reasons.
Given the C++ adoption on 1990's commercial software and major consumer operating systems (Apple, IBM, Microsoft, Be), I bet if the FSF with their coding guidelines had not advocated for C, the adoption would not taken off beyond those days.
"Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C."
> and I end up having all these typedefs in my projects
I avoid doing this now. It's more trouble than it's worth and it changes your code from a standard dialect of C into a custom one. Plus my eyes are old and they don't enjoy separating short identifiers.
> typedef struct { ... } String
I avoid doing this. Just use `struct string { ... };'. It makes it clear what you're handling. C23 finally gave us "auto", you shouldn't fret over typedefing everything anymore. I also prefer a "strbuf" type with an index and capacity so I can safely read and write to it with a derived "strview" having pointer and length only which references into the buffer.
> returning results
The general method of returning structures larger than two machine words is fairly inefficient. Plus you're cutting yourself off from another C23 gem which was [[nodiscard]]. If you want the 'ok' value checked then you can _really_ specify that. Put everything else behind a pointer passed in an argument. The sum type logic works just as well there.
> I tend to avoid the string.h functions most of the time, only employing the mem family when I want to, well, mess with memory.
So you use strlen() a lot and don't have to deal with multibyte characters anywhere in your code. It's not much of a strategy.
I was going to comment the same thing.
I had a coworker who had a very complicated set of "includes" that their code relied upon—not unlike the typedefs in the post. So his code was difficult to move around without also moving all his headers with it.
I try to minimize dependencies (custom headers, custom macros, etc.).
> So you use strlen() a lot and don't have to deal with multibyte characters anywhere in your code. It's not much of a strategy.
You don't need to support all multibyte encodings (i.e. DBCS, UCS-2, UCS-4, UTF-16 or UTF-32) characters if you're able to normalise all input to UTF-8.
I think, when you are building a system, restricting all (human language) input to be UTF-8 is a fair and reasonable design decision, and then you can use strlen to your hearts content.
Am I missing something here? UTF8 has multibyte characters, they're just spread across multiple bytes.
When you strlen() a UTF8 string, you don't get the length of the string, but instead the size in bytes.
Same with indices. If you Index at [1] in a string with a flag emoji, you don't get a valid UTF8 code point, but instead some part of the flag emoji. This applies with any UTF8 code points larger than 1 byte, which there are a lot of.
UTF16 or UTF32 are just different encodings.
What am I missing?
That's why UTF8 libraries exist.
7 replies →
> I think, when you are building a system, restricting all (human language) input to be UTF-8 is a fair and reasonable design decision, and then you can use strlen to your hearts content.
It makes no sense. If you only need the byte count then you can use strlen no matter what the encoding is. If you need any other kind of counting then you don't use strlen no matter what the encoding is (except in ASCII only environment).
"Whether I should use strlen or not" is a completely independent question to "whether my input is all UTF-8."
2 replies →
> > typedef struct { ... } String
> I avoid doing this. Just use `struct string { ... };'. It makes it clear what you're handling.
Well then imagine if Gtk made you write `struct GtkLabel`, etc. and you saw hundreds of `struct` on the screen taking up space in heavy UI code. Sometimes abstractions are worthwhile.
The main thing I dislike about typedefs is that you can't forward declare them.
If I know for sure I'm never going to need to do that then OK.
3 replies →
> Well then imagine if Gtk made you write `struct GtkLabel`, etc. and you saw hundreds of `struct` on the screen taking up space in heavy UI code. Sometimes abstractions are worthwhile.
TBH, in that case the GtkLabel (and, indeed, the entire widget hierarchy) should be opaque pointers anyway.
If you're not using a struct as an abstraction, then don't typedef it. If you are, then hide the damn fields.
Thank you! Because I wanted to point exactly that. When I was very junior programmer, and coded alone, I used to have “that elemental header” where lots of things were inside. Many of them to convert C in what I wished it was.
Now I think is between no good idea, and absolutely awful.
Yes, sometimes you wish some thing were different in a programming language “if only these types had shorter names”. But when you work in a team, first you should have consensus, and then modifying the language becomes a heavy load, that every new person in the project will have to lift.
“Modifying C is porting the Lisp curse to C” is my motto. Use all as standard, vanilla as possible.
> I’ve long been employing the length+data string struct. If there was one thing I could go back and time to change about the C language, it would be removal of the null-terminated string.
It's not necessary to go back in time. I proposed a way to do it in modern C - no existing code would break:
https://www.digitalmars.com/articles/C-biggest-mistake.html
It's simple, and easy to implement.
> the fatal error was not combining the array dimension with the array pointer; all it needs is a little new syntax a[...]; this won’t fix any existing code. Over time, the syntax a[] can be deprecated by convention and by compilers.
You're thinking in decades. C standard committee is slower than that. This could have worked in practice, but probably never will happen in practice. Maybe people should start considering a language like D[1] as an alternative, which seems to have the spirit of both C and Go, but with much more pragmatism than either.
[1] https://en.wikipedia.org/wiki/D_(programming_language)#Criti...
There is some irony in someone replying to the author of the D language suggesting that maybe the D language is the real solution he's looking for.
9 replies →
The C standard committee even refused Dennis Ritchie proposal for fat pointers.
https://www.nokia.com/bell-labs/about/dennis-m-ritchie/varar...
Meanwhile after UNIX was done at AT&T, the C language authors hardly cared for the C standard committee in regards to the C compiler supported features used in Plan 9 and Inferno, being only "mostly" compatible, followed up having a authoring role in Alef, Limbo and Go.
> The language accepted by the compilers is the core ANSI C language with some modest extensions, a greatly simplified preprocessor, a smaller library that includes system calls and related facilities, and a completely different structure for include files.
https://doc.cat-v.org/plan_9/4th_edition/papers/comp
I doubt most C advocates ever reflect on this.
3 replies →
The C committee is not afraid to add new syntax. And this is an easy addition.
Not only does it deliver a massive safety improvement, it dramatically speeds up strlen, strcmp, strcpy, strcat, etc. And you can pick out a substring without needing to allocate/copy. It's easy money.
As I see it, the problem with languages trying to replace C is that they not only try to fix fundamental flaws, but feel compelled to add unneeded features and break C's simplicity.
Even simpler, you can do something like this to have length-delimited AND null-terminated strings (written from memory, no guarantees of correctness etc.):
One of the advantages to the pointer + length approach is free substrings. This inline approach doesn't allow that.
https://web.archive.org/web/20260116161616/https://www.digit... for anyone here while we're swamping Walter's site
The site is built out of static pages, so it takes a lot to swamp it!
Please don’t buy into “no const”. If you’ve ever worked with a lot of C/C++ code, you really appreciate proper const usage and it’s very obvious if a prototype is written incorrectly because now any callers will have errors. No serious reusable library would expose functions taking char* without proper const usage. You would never be able to pass a C++ string c_str() to such a C function without a const_cast if that were the case. Casting away const is and should be an immediate smell.
Where is the author advocating not using const or casting it away?
“modified 2026-01-17T23:20:00Z”
Seems it was cast away
I'm a huge fan of the 'parse, don't validate' idiom, but it feels like a bit of a hurdle to use it in C - in order to really encapsulate and avoid errors, you'd need to use opaque pointers to hidden types, which requires the use of malloc (or an object pool per-type or some other scaffolding, that would get quite repetitive after a while, but I digress).
You basically have to trade performance for correctness, whereas in a language like C++, that's the whole purpose of the constructor, which works for all kinds of memory: auto, static, dynamic, whatever.
In C, to initialize a struct without dynamic memory, you could always do the following:
But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever. This is very close to 'validation' type behaviour. So to get real 'parsing' behaviour, dynamic memory is required, which is off-limits for many of the kinds of projects one would use C for in the first place.
I'm very curious as to how the author resolves this, given that they say they don't use dynamic memory often. Maybe there's something I missed while reading.
You can play tricks if you’re willing to compromise on the ABI:
Implementation (size checks, etc. elided):
Caller:
> But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever
This is nothing new in C. This problem has always existed by virtue of all struct members being public. Generally, programmers know to search the header file / documentation for constructor functions, instead of doing raw struct instantiation. Don‘t underestimate how good documentation can drive correct programming choices.
C++ is worse in this regard, as constructors don‘t really allow this pattern, since they can‘t return a None / false. The alternative is to throw an exception, which requires a runtime similar to malloc.
In C++ you can do: struct Foo { private: int val = 0; Foo(int newVal) : val(newVal) {} public: static optional<Foo> CreateFoo(int newVal) { if (newVal != SENTINEL_VALUE) { return Foo(newVal); } return {}; } };
In C++ you would have a protected constructor and related friend utility class to do the parsing, returning any error code, and constructing the thing, populating an optional, shared_ptr, whatever… don’t make constructors fallible.
> In the absence of proper language support, “sum types” are just structs with discipline.
With enough compiler support they could be more than that. For example, I submitted a tagged union analysis feature request to gcc and clang, and someone generalized it into a guard builtin.
https://github.com/llvm/llvm-project/issues/74205
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112840
GCC proved to be too complex for me to hack this in though. To this day I'm hoping someone better than me will implement it.
With proper discipline, one can even program a Turing machine directly. The problems are two: (1) Doing so is very slow and arduous, and (2) a chance of making a dangerous error is still quite high.
For instance, it appears that no amount of proper discipline, even in the best developers, allows to replace proper array support with a naked pointer to a memory area.
The compiler's job is to program the turing machine for us. It should help as much as possible. For example, I really like using enums because compilers have extensive support for checking that all values have been handled in switch statements.
I don't like it when compilers start getting in the way though. We use C because we want to do raw things like point a structure at some memory area in order to access the data stored there. The compiler's job is to generate the expected code without screwing it up by "optimizing" it beyond recognition because of strict aliasing or some other nonsense.
you can certainly wrap the array with a structure which provides either bounds information to be checked with generic runtime functions, or specific function pointers (methods) to get and set.
you can paper over _alot_ of Cs faults. ultimately its not really worth it, but its not nearly as fragile and arduous as you make it out to be
2 replies →
FWIW, Coverity (maybe others) has a checker that creates an error if it detects tagged union access without first checking the tag. It’s not as strict as enforcing which fields belong to which tag values, but it can still be useful. I’d much rather have what was proposed in the GCC bug!
If you really insist on not having a distinction between "u8"/"i8" and "unsigned char"/"signed char", and you've gone to the trouble of refusing to accept CHAR_BIT!=8, I'm pretty sure it'd be safer to typedef unsigned char u8 and typedef signed char i8. uint8_t/int8_t are not necessarily character types (see 6.2.5.20 and 7.22.1.1) and there are ramifications (see, e.g., 6.2.6.1, 6.3.2.3, 6.5.1).
Could you clarify an example of the ramifications?
I tried looking through the C2Y standard draft to figure it out, but it's too complicated for me.
With the disclaimer that I let my language lawyer qualification lapse a while ago, it's broadly to do with the character types being the only approved way to examine the bytes of an object. An object of a type can be accessed only as if it were an object of that type or some compatible type, but: it can also be accessed as a sequence of characters. (You'd do this if implementing memcpy, memset or memcmp, for example.)
6.2.6.1 - only character types can be used to inspect the sequence of bytes making up an objuect, and (interestingly) only an array of unsigned char is suitable for memcpy'ing an object into for inspection. It's possible for sequences of bytes to exist that don't represent a valid value of the original object; it's undefined behaviour to read those sequences of bytes other than via a character type (i.e., I think, via a pointer to something compatible with the object's actual type - there being no other valid ways to even attempt to read it)
6.3.2.3 - when casting a pointer to an object type to a pointer to a character type, the new character pointer points to the bytes of the object. If converting between object types, on the other hand, the original pointer will (with care) round trip, and that seems to be all you can do.
6.5.1 - as well as all the expected ways of accessing an object, objects can be accessed via a character pointer
> and you've gone to the trouble of refusing to accept CHAR_BIT!=8
This one was a head-scratcher for me. Yeah, there's no cost to check for it, but architectures where CHAR_BIT != 8 are rarer even than 24-bit architectures.
I got the impression the author was implying because CHAR_BIT is enforced to be 8 that uint8_t and char are therefore equivalent, but they are different types with very different rules.
E.g. `char p = (char )&astruct` may violate strict aliasing but `uint8_t p = (uint8_t )&astruct` is guaranteed legal. Then modulo, traps, padding, overflow, promotion, etc.
1 reply →
> I don’t personally do things that require dynamic memory management in C often, so I don’t have many practices for it. I know that wellons & co. Have been really liking the arena, and I’d probably like it too if I actually used the heap often. But I don’t, so I have nothing to say.
> If I find myself needing a bunch of dynamic memory allocations and lifetime management, I will simply start using another language–usually rust or C#.
I'm not sure what the modern standards are, but if you are writing in C, pre-allocate as much as possible. Any kind of garbage collection is just extra processing time and ideally you don't want to run out of memory during an allocation mid-execution.
People may frown at C, but nothing beats getting your inner loops into CPU cache. If you can avoid extra fetches into RAM, you can really crank some processing power. Example projects have included computer vision, servers a custom neural network - all of which had no business being so fast.
Solid list. The bit about avoiding the preprocessor as much as possible really resonates—using `static inline` functions and `enum` instead of macros makes debugging so much less painful. What's your take on using C11's `_Generic` for type-generic macros? It adds some verbosity but can save you from a lot of runtime type errors.
That made me smile
Now that is some C habit for the modern day... But huh, not C.
My go to language for that is lua. I'm starting to think of it as a C framework more so than its own language.
I started doing that in 1993 on MS-DOS already, thanks to C++ RAII, C felt outdated already on those days.
Arguably, 1993's C has survived better than 1993's C++.
1 reply →
Regarding memory, I recently changed to try to not use dynamic memory, or if I need to, to do it once at startup. Often static memory on startup is sufficient.
Instead use the stack much more and have a limit on how much data the program can handle fixed on startup. It adds the need to think what happens if your system runs out of memory.
Like OP said, it's not a solution for all types of programs. But it makes for very stable software with known and easily tested error states. Also adds a bit of fun in figuring out how to do it.
This.
As someone who spent most of their career as an embedded dev, yes, this is fine for (like parent said) some types of software.
Even for places where you'd think this is a bad idea, it's still can be a good approach, for example allocating and mapping all memory up to the limit you are designing. Honestly this is how engineering is done - you have specified limits in the design, and you work explicitly to those limits.
So "allocate everything at startup" need not be "allocate everything at program startup", it can be "allocate everything at workflow startup", where "workflow" can be a thread, a long-running input-directed sequence of functions, etc.
For example, I am starting a tiny stripped down web-server for a project, and my approach is going to be a single 4Kb[1] block for each request, allocated via a pool (which can expand on pressure up to some maximum) and returned to the pool once the response is sent.
The 4Kb includes at most 14 headers (regardless of each headers size) with the remaining data for the JSON payload. The JSON payload is limited to at most 10 fields. This makes parsing everything "allocate-less" because the array holding pointers to the keys+values of the header is `const char *headers[14]` and to the payload JSON data `const char *fields[10]`.
A request that doesn't fit in any of that will be rejected. This means that everything is simple and the allocation for each request happens once at startup (pool creation) even while parsing the input.
I'm toying with the idea of doing the same for responses too, instead of writing it out as and when the output is determined during the servicing of the request.
-------------------------
[1] I might switch to 6Kb or 8Kb if requests need more; whatever number is chosen, it's going to be a static number.
Dynamic memory allocation solves the problem of dynamic business requirements.
If you know your requirements up front, static memory initialisation is the way.
For instance, indexing a typed array with an enum is no different then an unordered map of string to int, IF you have all your business requirements up front
In recent years I had to write some firmware code with C and that was exactly the approach I took. So far I never had need for any dynamic memory and I was surprised how far I can get without it.
This is the way. Allocate all memory upfront. Create an allocator if you need to divy it up dynamically. Acquire all resources up front. Try to fit everything in stack. Much easier that way.
Only allocate on the heap if you absolutely have to.
I've been looking into Ada recently and it has cool safety mechanisms to encourage this same kind of thing. It even allows you to dynamically allocate on the stack for many cases.
You can allocate dynamically on the stack in C as well. Every compiler will give you some form of alloca().
4 replies →
I have some firmware that runs an event loop. There is no malloc anywhere. But I do have an area which gets reset event handler after each call. Useful for passing objects up the call stack.
One other thing I tend to do anything that needs to live longer than the current call stack gets copied into a queue of some sort. I feel it's kinda doing manually what rusts borrow checker tries to enforce.
Fun fact: the background image is the "BallsMany" pattern included with MagicWB for the Amiga
(To confirm: download the LhA archive from https://aminet.net/package/util/wb/MagicWB21p then open the archive in 7-zip, extract Patterns/BallsMany then load into an ILBM viewer, e.g. https://www.retroreversing.com/ilbm )
Nice post, but the flashy thing on the side is pretty distracting. I liked the tuples and maybes.
Not distracting at all, it feels nostalgic to me. Id rather have these flashy things than a million popups and registration forms following you around, which is basically the modern web. I hate it so much. This site is pure balsam for my soul.
Both nostalgic and distracting for me.
Two things I thought while reading the post: Why not typedef BitInt types for stricter size and accidental promotion control when typedeffing for easier names anyway? I came across a post mentioning using regular arrays instead of strings to avoid the null terminatorand off-by-one pitfalls.
I still have a lot of conversion to do before I can try this in my hobby project, but these are interesting ideas.
In modern C you can use static_assert to make this a bit nicer.
...although it would be a bit of a shame IMHO to add that reflexively in code that doesn't necessarily require it.
https://en.cppreference.com/w/c/language/_Static_assert.html
Even if the code might not end up requiring it, if you write it with the assumption that bytes are 8 bits, it's good to document that with a static assert so someone porting things knows there will be dragons
It's a pretty neat way to drop some corner cases from your mental load without building subtle traps
Gtav
> Additionally, the intent of whether the buffer is used as “raw” memory chunks versus a meaningful u8 is pretty clear from the code that it gets used in, so I’m not worried about confusing intent with it.
It's generally not clear to the compiler, and that can result in missed optimization opportunities.
I really dislike parsing not validating as general advice. IMO this is the true differentiator of type systems that most people should be familiar with instead of "dynamic vs static" or "strong vs weak".
Adding complexity to your type system and to the representation of types within your code has a cost in terms of mental overhead. It's become trendy to have this mental model where the cost of "type safety" is paid in keystrokes but pays for itself in reducing mental overhead for the developers. But in reality you're trading one kind of mental overhead for another, the cost you pay to implement it is extra.
It's like "what are all the ways I could use this wrong" vs "what are all the possibilities that exist". There's no difference in mental overhead between between having one tool you can use in 500 ways or 500 tools you can use in 1 way, either way you need to know 500 things, so the difference lies elsewhere. The effort and keystrokes that you use to add type safety can only ever increase the complexity of your project.
If you're going to pay for it, that complexity has to be worth it. Every single project should be making a conscious decision about this on day one. For the cost to be worth it, the rate of iteration has to be low enough and the cost of runtime bugs has to be high enough. Paying the cost is a no brainer on a banking system, spacecraft or low level library depended on by a million developers.
Where I think we've lost the plot is that NOT paying the cost should be a no brainer for stuff like front end web development and video games where there's basically zero cost in small bugs. Typescript is a huge fuck up on the front end, and C++ is a 30 year fuck up in the games industry. Javascript and C have problems and aren't the right languages for those respective jobs, but we completely missed the point of why they got popular and didn't learn anything from it, and we haven't created the right languages yet for either of those two fields.
Same concept and cost/benefit analysis applies to all forms of testing, and formal verification too.
While I broadly agree with your general point, in that engineering is making a set of trade-offs, I don't necessarily agree that ditching type-safety in the example contexts you posted is the appropriate trade-off.[1]
I'll ditch type-safety in experimental/exploratory code; I'll use Lisp (or, more recently, Python) to test if something is a good idea. For anything that ships to production, I think a basic level of type enforcement is necessary, even if you don't want the whole type zoo.
For your Javascript f/end context, I like the proposed TC39 approach (https://github.com/tc39/proposal-type-annotations?tab=readme...). The typing is optional, does not break existing syntax and can still be used to enforce a basic level of type safety if the developer wants it.
----------------------------
[1] I upvoted you anyway. Your broader point is still valid.
I'm not talking about ditching type safety. I'm saying the whole concept of "safe" and "unsafe" as most people on HN understand it is flawed. The interesting part of a type system isn't whether the compiler checks types or if we just go lmao fuck it let's not even bother, it's whether or not you need to represent the types in your code in order for the compiler to check them. For the majority of what people want from type safety in a language like Javascript, the answer is that no, you don't need to, as long as you're willing to not have every single language feature under the sun.
With compiled languages you can statically infer a ton of type information without having to pepper your codebase with repeated references to what something is. Nominal typing essentially boils down to a double-check of your work, you specify the type separately and then purposely assign it to a variable, so that if you make a mistake with either part the compiler picks it up.
But those kinds of double-checks can be done for almost anything (outside of dynamic boundaries like io/dlls) without nominal type signatures in the code, as long as you jettison the ability to change types at runtime. No language as far as I can tell actually does this because we're all so obsessed with the false dichotomy of nominal and dynamic typing.
In JS everyone likes to use string unions in place of enums so let's use that as an example. If you have something that is only ever set as "foo" or "bar", that's effectively a boolean. If you receive that string in another function, make a typo and write if (str == "boo"), then in every single language I'm aware of that passes a compiler check. But it shouldn't, because the compiler has all the information it needs to statically catch that error and fail the build. The set of assignments to that variable and the set of equality checks on it provide the two parts of the double-check.
In a perfect world we'd have 10 of these "middle of the road" strongly typed static languages to choose from that all optimise for minimal type representation in their own unique way. But every time I see one of these projects pop up on HN it gets like 10 comments then disappears into the sunset because the programming community is so enraptured with the nominal type system of C and all the fucking bullshit Bjarne Stroustrup pasted on top of it 40 years ago. So we end up with this silly situation where the only things considered "safe" by the crowd are strict descendants of C/C++ with the array/pointer/string screw-ups that made those languages unsafe removed.
really cool website, what's your colour palette?
> I think one of the most eye-opening blog posts I read when getting into programming initially was the evergreen parse, don’t validate post
Bro, that was written in 2019. If it's not old enough to drink it's not yet evergreen. But it's also long-winded. A 25-minute read, and y'know what the conclusion is? "Parsing leaves you with a new data structure matching a type, validation checks if some data technically complies with a type (but might not later be parsed correctly)".
I need all the baby programmers in the back to hear me: type systems are bikeshedding. The point of a type is only to restrict computation to a fixed set. This concept can be applied anywhere you need to ensure reliability and simplicity. You don't need a programming language to natively support types in order to implement the concept yourself in that language.
> You don't need a programming language to natively support types in order to implement the concept yourself in that language.
In a programming language that doesn't enforce types, how do you implement
> "Parsing leaves you with a new data structure matching a type, validation checks if some data technically complies with a type (but might not later be parsed correctly)".
#define BEGIN {
#define END }
/* scream! */
/* huh */
Uh that piece of horror was not in the post. Phew.
Yet another C person reinventing things which C++ already has.
C++ has many things, and that is why many programmers want to stick with C
if you don't like those things, then don't use them
1 reply →
It is like those folks that rather write JSDoc comments than using a linter like Typescript, because reasons.
Given the C++ adoption on 1990's commercial software and major consumer operating systems (Apple, IBM, Microsoft, Be), I bet if the FSF with their coding guidelines had not advocated for C, the adoption would not taken off beyond those days.
"Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C."
The GNU Coding Standard in 1994, http://web.mit.edu/gnu/doc/html/standards_7.html#SEC12
> Yet another C person reinventing things which C++ already has.
And yet another C++ person salty that people prefer simpler things.
C23 + <compiler C extensions> is hardly simpler as people advocate.
10 replies →