Parameterized types in C using the new tag compatibility rule

7 months ago (nullprogram.com)

99 comments

ingve

The recent #def #enddef proposal[1] would eliminate the need for backslashes to define readable macros, making this pattern much more pleasant, finger crossed for its inclusion in C2Y!

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3531.txt

cb321 7 months ago
While long-def's might be nice, you can even back in ANSI C 89 get rid of the backslash pattern (or need to cc -E and run through GNU indent/whatever) by "flipping the script" and defining whole files "parameterized" by their macro environment like https://github.com/c-blake/bst or https://github.com/glouw/ctl/
Add a namespacing macro and you have a whole generics system, unlike that in TFA.
So, it might add more value to have the C std add an `#include "file.c" name1=val1 name2=val2` preprocessor syntax where name1, name2 would be on a "stack" and be popped after processing the file. This would let you do types/functions/whatever "generic modules" with manual instantiation which kind of fits with C (manual management of memory, bounds checking, etc.) but preprocessor-assisted "macro scoping" for nested generics. Perhaps an idea to play with in your slimcc fork?
- fuhsnn 7 months ago
  
  > `#include "file.c" name1=val1 name2=val2`
  That's an interesting idea! I think D or Zig's C header importer had similar syntax, I'm definitely gonna do it.
  
  1 reply →
- glouwbug 7 months ago
  
  I've been thinking of maybe doing CTL2 with this. Maybe if #def makes it in.
  
  1 reply →
hyperbolablabla 7 months ago
I really don't think the backslashes are that annoying? Seems unnecessary to complicate the spec with stuff like this.
- cb321 7 months ago
  
  FWIW, https://www.cs.cornell.edu/andru/ Andrew Myers had some patch to gcc to do this back in the late 90s.
  Anyway, as is so often the case, it's about the whole ecosystem not just of tooling but the ecosystem of assumptions about & around tooling.
  As I mentioned in my other comment, if you want you can always cc -E and re-format the code somehow, although the main times you want to do that are for line-by-line stepping in debuggers or maybe for other cases of "lines as source coordinates" like line-by-line profilers.
  Of course, a more elegant solution might be just having more "adjustable step size/source coordinates" like "single ';'-statement or maybe single sequence control point in debuggers than just "line orientation". This is, in fact, so natural an idea that it seems a virtual certainty some C debugger has an "expressional step/next", especially if written by a fan more of Lisp than assembly. Of course, at some point a library is just debugged/trusted, but if there are "user hooks" those can be buggy. If it's performance important, it may never be unwelcome to have better profile reports.
  While addr2line has been a thing forever, I've never heard of an addr2expr - probably because "how would you label it?" So, pros & cons, but easy for debugger/profilers is one reason I think the parameterized file way is lower friction.
  
  3 replies →
- kreco 7 months ago
  
  The backslashes itself make the preprocessor way more complicated for no real advantage (apart when it's unavoidable like in macros).
  For every single symbol you need to actually check if there is a splice (backslash + new line) in it. For single pass compiler, this contribute to a very slow lexing phase as this splice can appear anywhere in a C/C++ code.
  
  2 replies →

JonChesterfield 7 months ago

Not personally interested in this hack, but https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3037.pdf means struct foo {} defined multiple times with the same fields in the same TU now refers to the same thing instead of to UB and that is a good bugfix.

Arnavion 7 months ago

Neat similarity to Zig's approach to generic types. The generic type is defined as a type constructor, a function that returns a type. Every instantiation of that generic type is an invocation of that function. So the generic growable list type is `fn ArrayList(comptype T: type) type` and a function that takes two lists of i32 and returns a third is `fn foo(a: ArrayList(i32), b: ArrayList(i32)) ArrayList(i32)`

IAmLiterallyAB 7 months ago

If you're reaching for that hack, just use C++? You don't have to go all in on C++-isms, you can always write C-style C++ and only use the features you need.

pton_xd 7 months ago
Yeah as someone who writes C in C++, everytime I see posts bending over backwards trying to fit parameterized types into C I just cringe a little. I understand the appeal of sticking to pure C, but... why do that to yourself? Come on over, we've got lambdas, and operator overloading for those special circumstances... the water's fine!
- spc476 7 months ago
  
  So maybe you can answer the following question I have: what is a "protected abstract virtual base pure virtual private destructor," and when was the last time you needed one?" At least with C, I understand the feature set and how they interact.
  
  3 replies →
- pjmlp 7 months ago
  
  Some people will do as much as they can to hurt themselves, only to avoid using C++.
  Note as the newer versions are basically C++ without Classes kind of thing.
  
  20 replies →
waynecochran 7 months ago
Not always a viable option -- especially for embedded and systems programming.
- Too 7 months ago
  
  In embedded you are typically stuck on some ancient proprietary compiler and can't take advantage of the latest C versions. Even less so if you need safety standards like MISRA.
  That of course doesn't help you with the switch away from C. The question is why they keep updating the language. The only ones with valid reasons to not upgrade to some more sane language can't take advantage of the new features.
- _proofs 7 months ago
  
  i work in an embedded space in the context of devices and safety. if it were as simple as "just use c++ for these projects" most of us would use a subset, and our newer projects try to make this a requirement (we roll our own ETL for example).
  however for some niche os specific things, and existing legacy products where oversight is involved, simply rolling out a c++ porting of it on the next release is, well, not a reality, and often not worth the bureaucratic investment.
  while i have no commentary on the post because i'm not really a c programmer, i think a lot of comments forget some projects have requirements, and sometimes those requirements become obsolete, but you're struck with what you got until gen2, or lazyloading standardization across teams.
sim7c00 7 months ago

you are so right..thought hisotrically i would of disagreed just by being triggered.
templates is the main thing c++ has over c. its trivial to circumvent or escape the thing u dont 'like' about c++ like new and delete (personal obstacle) and write good nice modern c++ with templates.
C generic can help but ultimately, in my opinion, the need for templating is a good one to go from C to C++.

uecker 7 months ago

Here is my experimental library for generic types with some godbolt links to try: https://github.com/uecker/noplate

wsve 7 months ago

Sometimes I look at the way C macros are used to simulate generics and wonder to myself... Why don't y'all just put templates into the standard? If the way you're writing C code is by badly imitating C++, then just imitate C++! There's no shame in it!

jimbob45 7 months ago

C++ doesn’t force you to pay for anything you don’t use so you can just use the C++ compiler at that point and change the few incompatibilities between C and C++.
That said…I agree that there is a lot of syntactic sugar that could be added for free to C.
uecker 7 months ago

Maybe you could try to formulate it what sense this approach is actually inferior? IMHO it is superior to C++ templates by being far simpler.

rwmj 7 months ago

Slighty off-topic, why is he using ptrdiff_t (instead of size_t) for the cap & len types?

foobar12345quux 7 months ago

Hi Rich, using ptrdiff_t is (alas) the right thing to do: pointer subtraction returns that type, and if the result doesn't fit, you get UB. And ptrdiff_t is a signed type.
Assume you successfuly allocate an array "arr" with "sz" elements, where "sz" is of type "size_t". Then "arr + sz" is a valid expression (meaning the same as "&arr[sz]"), because it's OK to compute a pointer one past the last element of an array (but not to dereference it). Next you might be tempted to write "arr + sz - arr" (meaning the same as "&arr[sz] - &arr[0]"), and expect it to produce "sz", because it is valid to compute the element offset difference between two "pointers into an array or one past it". However, that difference is always signed, and if "sz" does not fit into "ptrdiff_t", you get UB from the pointer subtraction.
Given that the C standard (or even POSIX, AIUI) don't relate ptrdiff_t and size_t to each other, we need to restrict array element counts, before allocation, with two limits:
- nelem <= (size_t)-1 / sizeof(element_type)
- nelem <= PTRDIFF_MAX
(I forget which standard header #defines PTRDIFF_MAX; surpisingly, it is not <limits.h>.)
In general, neither condition implies the other. However, once you have enforced both, you can store the element count as either "size_t" or "ptrdiff_t".
r1chardnl 7 months ago
From one of his other blogposts. "Guidelines for computing sizes and subscripts"
Never mix unsigned and signed operands. Prefer signed. If you need to convert an operand, see (2).
https://nullprogram.com/blog/2024/05/24/
https://www.youtube.com/watch?v=wvtFGa6XJDU
- poly2it 7 months ago
  
  I still don't understand how these arguments make sense for new code. Naturally, sizes should be unsigned because they represent values which cannot be unsigned. If you do pointer/size arithmetic, the only solution to avoid overflows is to overflow-check and range-check before computation.
  You cannot even check the signedness of a signed size to detect an overflow, because signed overflow is undefined!
  The remaining argument from what I can tell is that comparisons between signed and unsigned sizes are bug-prone. There is however, a dedicated warning to resolve this instantly.
  It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.
  Given this, I can't understand the justification. I'm currently using unsigned sizes. If you have anything contradicting, please comment :^)
  
  22 replies →
rurban 7 months ago

Skeeto and Stroustrup are a bit confused about valid index types. They prefer signed, which will lead to overflows on negative values, but have the advantage of using only half of the valid ranges, so there's more heap for the rest. Very confused

unwind 7 months ago

I think this is an interesting change, even though I (as someone who has loved C for 30+ years and use it daily in a professional capacity) don't immediately see a lot of use-cases I'm sure they can be found as the author demonstrates. Cool, and a good post!

glouwbug 7 months ago

Combined with C23's auto (see vec_for) you can technically backport the entirety of C++'s STL (of course with skeeto's limitation in his last paragraph in mind). gcc -std=c23. It is a _very_ useful feature for even the mundane, like resizable arrays:

  #include <stdlib.h>
  #include <stdio.h>
  
  #define vec(T) struct { T* val; int size; int cap; }
  
  #define vec_push(self, x) {                                                 \
      if((self).size == (self).cap) {                                         \
          (self).cap = (self).cap == 0 ? 1 : 2 * (self).cap;                  \
          (self).val = realloc((self).val, sizeof(*(self).val) * (self).cap); \
      }                                                                       \
      (self).val[(self).size++] = x;                                          \
  }
  
  #define vec_for(self, at, ...)             \
      for(int i = 0; i < (self).size; i++) { \
          auto at = &(self).val[i];          \
          __VA_ARGS__                        \
      }
  
  typedef vec(char) string;
  
  void string_push(string* self, char* chars)
  {
      if(self->size > 0)
      {
          self->size -= 1;
      }
      while(*chars)
      {
          vec_push(*self, *chars++);
      }
      vec_push(*self, '\0');
  }
  
  int main()
  {
      vec(int) a = {};
      vec_push(a, 1);
      vec_push(a, 2);
      vec_push(a, 3);
      vec_for(a, at, {
          printf("%d\n", *at);
      });
      vec(double) b = {};
      vec_push(b, 1.0);
      vec_push(b, 2.0);
      vec_push(b, 3.0);
      vec_for(b, at, {
          printf("%f\n", *at);
      });
      string c = {};
      string_push(&c, "this is a test");
      string_push(&c, " ");
      string_push(&c, "for c23");
      printf("%s\n", c.val);
  }

int_19h 7 months ago

What I don't quite get is why they didn't go all the way in and basically enabled full fledged structural typing for anonymous structs.

3 replies →

o11c 7 months ago

Are we getting a non-broken `_Generic` yet? Because that's the thing that made me give up with disgust the last project I tried to write in C. Manually having to do `extern template` a few times is nothing in comparison.

uecker 7 months ago
What is a non-broken `_Generic' ?
- o11c 7 months ago
  
  A `_Generic` that only requires its expressions to be valid for the type associated with them, rather than spewing errors everywhere.
  
  1 reply →

Surac 7 months ago

i fear this will make slopy code compile more often OK.

poly2it 7 months ago
Dear God I hope nobody is committing unreviewed LLM output in C codebases.
- pests 7 months ago
  
  No worries, the LLM commits it for you.
- pjmlp 7 months ago
  
  Eventually they will generate executables directly.
ioasuncvinvaer 7 months ago

Can you give an example?

tialaramex 7 months ago

It seems as though this makes it impossible to do the new-type paradigm in C23 ? If Goose and Beaver differ only in their name, C now thinks they're the same type so too bad we can tell a Beaver to fly even though we deliberately required a Goose ?

yorwba 7 months ago
"Tag compatibility" means that the name has to be the same. The issue the proposal is trying to address is that "struct Goose { float weight; }" and "struct Goose { float weight; }" are different types if declared in different locations of the same translation unit, but the same if declared in different translation units. With tag compatibility, they would always be treated as being the same.
"struct Goose { float weight; }" and "struct Beaver { float weight; }" would remain incompatible, as would "struct { float weight; }" and "struct { float weight; }" (since they're declared without tags.)
- tialaramex 7 months ago
  
  Ah, thanks, that makes sense.