Comment by amelius

2 months ago

We should just move away from null-terminated strings, where we can, as fast as we can.

14 comments

amelius

masklinn 2 months ago

We have. C is basically the only langage in any sort of widespread use where terminated strings are a thing.

Which of course causes issues when languages with more proper strings interact with C but there you go.

saghm 2 months ago
Given that the C ABI is basically the standard for how arbitrary languages interact, I wouldn't characterize all of the headaches this can cause as just when other languages interact with C; arguably it can come up when any two languages interact at all, even if neither are C.
- tialaramex 2 months ago
  
  Arguably the C ABI was one of those Worse is Better problems like the C language itself. Better languages already existed, but C was basically free and easy to implement, so now there's C everywhere. It seems likely that if not for this ABI we might have an ABI today where all languages which want to offer FFI can agree on how to represent say the immutable slice reference type (Rust's &[T], C++ std::span)
  Just an agreed ABI for slices would be enough that language A's growable array type (Rust's Vec, C++ std::vector, but equally the ArrayList or some languages even call this just "list") of say 32-bit signed integers can give a (read only) reference to language B to look at all these 32-bit signed integers without language's A and B having to agree how growable arrays work at all. In C today you have to go wrestle with the ABI pig for much less.
  
  1 reply →
thayne 2 months ago
We should move away from it in C usage as well.
Ideally, the standard would include a type that packages a string with its length, and had functions that used that type and/or took the length as an argument. But even without that it is possible avoid using null terminated strings in a lot of places.
- BobbyTables2 2 months ago
  
  The standard C library can’t even manipulate NUL terminated strings for common use cases…
  Simple things aren’t simple - want to append a formatted string to an existing buffer? Good luck! Now do it with UTF-8!
  I truly feel the standard library design did more disservice to C than the language definition itself.
  
  1 reply →
throwaway2037 2 months ago
Doesn't C++'s std::string also use a null terminated char* string internally? Do you count that also?
- zabzonk 2 months ago
  
  Since C++11 it is required to be null-terminated, you can access the terminator with (for e.g.) operator[], and the string can contain non-terminator null characters.
- anal_reactor 2 months ago
  
  This doesn't count because it's implemented in a way "if you don't need null-terminated string, you won't see it".
- masklinn 2 months ago
  
  It has nul-termination for compatibility with C, so you can call c_str and get a C string. With the caveat that an std::string can have nuls anywhere, which breaks C semantics. But C++ does not use that itself.
ofalkaed 2 months ago

>Which of course causes issues when languages with more proper strings interact with C but there you go.
Is is an issue of "more proper strings" or just languages trying to have their cake and eat it too? have their sense of a string and C interoperability. I think this is were we see the strength of Zig, it's strings are designed around and extend the C idea of string instead of just saying our way is better and we are just going to blame C for any friction.
My standard disclaimer comes into play here, I am not a programmer and very much a humanities sort, I could be completely missing what is obvious. Just trying to understand better.
Edit: That was not quite right, Zig has its string literal for C compatibility. There is something I am missing here in my understanding of strings in the broader sense.

raverbashing 2 months ago

Yes

And maybe even have a (arch dependent) string buffer zone where the actual memory length is a multiple of 4 or even 8