Comment by jacquesm

4 days ago

There are many, many more such issues with that code. The person that posted it is new to C and had an AI help them to write the code. That's a recipe for disaster, it means the OP does not actually understand what they wrote. It looks nice but it is full of footguns and even though it is a useful learning exercise it also is a great example of why it is better run battle tested frame works than to inexpertly roll your own.

As a learning exercise it is useful, but it should never see production use. What is interesting is that the apparent cleanliness of the code (it reads very well) is obscuring the fact that the quality is actually quite low.

If anything I think the conclusion should be that AI+novice does not create anything that is useable without expert review and that that probably adds up to a net negative other than that the novice will (hopefully) learn something. It would be great if someone could put in the time to do a full review of the code, I have just read through it casually and already picked up a couple of problems, I'm pretty sure that if you did a thorough job of it there would be many more.

59 comments

jacquesm

drnick1 4 days ago

> What is interesting is that the apparent cleanliness of the code (it reads very well) is obscuring the fact that the quality is actually quite low.

I think this is a general feature and one of the greatest advantages of C. It's simple, and it reads well. Modern C++ and Rust are just horrible to look at.

messe 4 days ago
I slightly unironically believe that one of the biggest hindrances to rust's growth is that it adopted the :: syntax from C++ rather than just using a single . for namespacing.
- jacquesm 4 days ago
  
  I believe that the fanatics in the rust community were the biggest factor. They turned me off what eventually became a decent language. There are some language particulars that were strange choices, but I get that if you want to start over you will try to get it all right this time around. But where the Go authors tried to make the step easy and kept their ego out of it, it feels as if the rust people aimed at creating a new temple rather than to just make a new tool. This created a massive chicken-and-the-egg problem that did not help adoption at all. Oh, and toolchain speed. For non-trivial projects for the longest time the rust toolchain was terribly slow.
  I don't remember any other language's proponents actively attacking the users of other programming language.
  
  17 replies →
citbl 4 days ago
The safer the C code, the more horrible it starts looking though... e.g.
my_func(char msg[static 1])
- moefh 4 days ago
  
  I don't understand why people think this is safer, it's the complete opposite.
  With that `char msg[static 1]` you're telling the compiler that `msg` can't possibly be NULL, which means it will optimize away any NULL check you put in the function. But it will still happily call it with a pointer that could be NULL, with no warnings whatsoever.
  The end result is that with an "unsafe" `char *msg`, you can at least handle the case of `msg` being NULL. With the "safe" `char msg[static 1]` there's nothing you can do -- if you receive NULL, you're screwed, no way of guarding against it.
  For a demonstration, see[1]. Both gcc and clang are passed `-Wall -Wextra`. Note that the NULL check is removed in the "safe" version (check the assembly). See also the gcc warning about the "useless" NULL check ("'nonnull' argument 'p' compared to NULL"), and worse, the lack of warnings in clang. And finally, note that neither gcc or clang warn about the call to the "safe" function with a pointer that could be NULL.
  [1] https://godbolt.org/z/qz6cYPY73
  
  1 reply →
- uecker 4 days ago
  
  Compared to other languages, this is still nice.
  
  4 replies →
- pjmlp 3 days ago
  
  Meanwhile, in Modula-2 from 1978, that would be
  PROCEDURE my_func(msg: ARRAY OF CHAR);
  Now you can use LOW() and HIGH() to get the lower and upper bounds, and naturally bounds checked unless you disabled them, locally or globaly.
  
  4 replies →

nurettin 4 days ago

> should never see production use.

I have an issue with high strung opinions like this. I wrote plenty of crappy delphi code while learning the language that saw production use and made a living from it.

Sure, it wasn't the best experience for users, it took years to iron out all the bugs and there was plenty of frustration during the support phase (mostly null pointer exceptions and db locks in gui).

But nobody would be better off now if that code never saw production use. A lot of business was built around it.

zdragnar 4 days ago

Buggy code that just crashes or produces incorrect results are a whole different category. In C a bug can compromise a server and your users. See the openssl heart bleed vulnerability as a prime example.
Once upon a time, you could put up a relatively vulnerable server, and unless you got a ton of traffic, there weren't too many things that would attack it. Nowadays, pretty much anything Internet facing will get a constant stream of probes. Putting up a server requires a stricter mindset than it used to.
jacquesm 4 days ago
There are minimum standards for deployment to the open web. I think - and you're of course entirely free to have a different opinion - that those are not met with this code.
- nurettin 4 days ago
  
  Yes, I have lots of opinions!
  I guess the question at spotlight is: At what point would your custom server's buffer overflow when reading a header matter and would that bug even exist at that point?
  Could a determined hacker get to your server without even knowing what weird software you cooked up and how to exploit your binary?
  We have a lot of success stories born from bad code. I mean look at Micro$oft.
  Look at all the big players like discord leaking user credentials. Why would you still call out the little fish?
  Maybe I should create a form for all these ahah.
  
  7 replies →

lifthrasiir 4 days ago

Yeah, I recently wrote a moderate amount of C code [1] entirely with Gemini and while it was much better than what I initially expected I needed a constant steering to avoid inefficient or less safe code. It needed an extensive fuzzing to get the minimal amount of confidence, which caught at least two serious problems---seriously, it's much better than most C programmers, but still.

[1] https://github.com/lifthrasiir/wah/blob/main/wah.h

jacquesm 4 days ago
I've been doing this the better part of a lifetime and I still need to be careful so don't feel bad about it. Just like rust has an 'unsafe' keyword I realize all of my code is potentially unsafe. Guarding against UB, use-after-free, array overruns and so on is a lot of extra work and you only need to slip up once to have a bug, and if you're unlucky something exploitable. You get better at this over the years. But if I know something needs to be bullet proof the C compiler would not be my first tool of choice.
One good defense is to reduce your scope continuously. The smaller you make your scope the smaller the chances of something escaping your attention. Stay away from globals and global data structures. Make it impossible to inspect the contents of a box without going through a well defined interface. Use assertions liberally. Avoid fault propagation, abort immediately when something is out of the expected range.
- uecker 4 days ago
  
  I strategy that helps me is just not use open-coded pointer arithmetic or string manipulation but encapsulate those behind safe bounds-checked interfaces. Then essentially only life-time issues remain and for those I usually do have a simple policy and clearly document any exception. I also use signed integers and the sanitizer in trapping mode, which turns any such issue I may have missed into a run-time trap.
  
  5 replies →
OneLessThing 4 days ago
This is exactly my problem with LLM C code, lack of confidence. On the other hand, when my projects get big enough to the point where I cannot keep the code base generally loaded into my brains cache they eventually get to the point where my confidence comes from extensive testing regardless. So maybe it's not such a bad approach.
I do think that LLM C code if made with great testing tooling in concert has great promise.
- jacquesm 4 days ago
  
  That generalizes to anything LLM related.
lelanthran 4 days ago
> It needed an extensive fuzzing to get the minimal amount of confidence, which caught at least two serious problems---seriously, it's much better than most C programmers, but still.
How are you doing your fuzzing? You need either valgrind (or compiler sanitiser flags) in the loop for a decent level of confidence.
- lifthrasiir 4 days ago
  
  The "minimal" amount of confidence, not a decent level of confidence. You are completely right that I need much more to establish anything higher than that.

OneLessThing 4 days ago

I agree that it reads really well which is why I was also surprised the quality is not high when I looked deeper. The author claims to have only used AI for the json code, so your conclusion may be off, it could just be a novice doing novice things.

I suppose I was just surprised to find this code promoted in my feed when it's not up to snuff. And I'm not hating, I do in fact love the project idea.

citbl 4 days ago

The irony is also that AI could have been used to audit the code and find these issues. All the author had to do was to question.