Comment by xyzzyz

1 day ago

Go was designed by some old-school folks that maybe stuck a bit too hard to their principles, losing sight of the practical conveniences.

I'd say that it's entirely the other way around: they stuck to the practical convenience of solving the problem that they had in front of them, quickly, instead of analyzing the problem from the first principles, and solving the problem correctly (or using a solution that was Not Invented Here).

Go's filesystem API is the perfect example. You need to open files? Great, we'll create

  func Open(name string) (*File, error)

function, you can open files now, done. What if the file name is not valid UTF-8, though? Who cares, hasn't happen to me in the first 5 years I used Go.

147 comments

xyzzyz

jerf 1 day ago

While the general question about string encoding is fine, unfortunately in a general-purpose and cross-platform language, a file interface that enforces Unicode correctness is actively broken, in that there are files out in the world it will be unable to interact with. If your language is enforcing that, and it doesn't have a fallback to a bag of bytes, it is broken, you just haven't encountered it. Go is correct on this specific API. I'm not celebrating that fact here, nor do I expect the Go designers are either, but it's still correct.

klodolph 1 day ago
This is one of those things that kind of bugs me about, say, OsStr / OsString in Rust. In theory, it’s a very nice, principled approach to strings (must be UTF-8) and filenames (arbitrary bytes, almost, on Linux & Mac). In practice, the ergonomics around OsStr are horrible. They are missing most of the API that normal strings have… it seems like manipulating them is an afterthought, and it was assumed that people would treat them as opaque (which is wrong).
Go’s more chaotic approach to allow strings to have non-Unicode contents is IMO more ergonomic. You validate that strings are UTF-8 at the place where you care that they are UTF-8. (So I’m agreeing.)
- duckerude 1 day ago
  
  The big problem isn't invalid UTF-8 but invalid UTF-16 (on Windows et al). AIUI Go had nasty bugs around this (https://github.com/golang/go/issues/59971) until it recently adopted WTF-8, an encoding that was actually invented for Rust's OsStr.
  WTF-8 has some inconvenient properties. Concatenating two strings requires special handling. Rust's opaque types can patch over this but I bet Go's WTF-8 handling exposes some unintuitive behavior.
  There is a desire to add a normal string API to OsStr but the details aren't settled. For example: should it be possible to split an OsStr on an OsStr needle? This can be implemented but it'd require switching to OMG-WTF-8 (https://rust-lang.github.io/rfcs/2295-os-str-pattern.html), an encoding with even more special cases. (I've thrown my own hat into this ring with OsStr::slice_encoded_bytes().)
  The current state is pretty sad yeah. If you're OK with losing portability you can use the OsStrExt extension traits.
  
  1 reply →
- Kinrany 1 day ago
  
  > You validate that strings are UTF-8 at the place where you care that they are UTF-8.
  The problem with this, as with any lack of static typing, is that you now have to validate at _every_ place that cares, or to carefully track whether a value has already been validated, instead of validating once and letting the compiler check that it happened.
  
  7 replies →
- pas 21 hours ago
  
  It's completely in-line with Rust's approach. Concentrate on the hard stuff that lifts every boat. Like the type system, language features, and keep the standard library very small, and maybe import/adopt very successful packages. (Like once_cell. But since removing things from std is considered a forever no-no, it seems path handling has to be solved by crates. Eg. https://github.com/chipsenkbeil/typed-path )

stouset 1 day ago

[flagged]

blibble 1 day ago
> Golang makes it easy to do the dumb, wrong, incorrect thing that looks like it works 99.7% of the time. How can that be wrong? It works in almost all cases!
my favorite example of this was the go authors refusing to add monotonic time into the standard library because they confidently misunderstood its necessity
(presumably because clocks at google don't ever step)
then after some huge outages (due to leap seconds) they finally added it
now the libraries are a complete a mess because the original clock/time abstractions weren't built with the concept of multiple clocks
and every go program written is littered with terrible bugs due to use of the wrong clock
https://github.com/golang/go/issues/12914 (https://github.com/golang/go/issues/12914#issuecomment-15075... might qualify for the worst comment ever)
- 0cf8612b2e1e 1 day ago
  
  This issue is probably my favorite Goism. Real issue identified and the feedback is, “You shouldn’t run hardware that way. Run servers like Google does without time jumping.” Similar with the original stance to code versioning. Just run a monorepo!
0x696C6961 1 day ago
[flagged]
- jack_h 1 day ago
  
  It’s not about making zero mistakes, it’s about learning from previous languages which made mistakes and not repeating them. I decided against using go pretty early on because I recognized just how many mistakes they were repeating that would end up haunting maintainers.
  
  3 replies →
jen20 1 day ago
I can count on fewer hands the number of times I've been bitten by such things in over 10 years of professional Go vs bitten just in the last three weeks by half-assed Java.
- gf000 1 day ago
  
  There is a lot to say about Java, but the libraries (both standard lib and popular third-party ones) are goddamn battle-hardened, so I have a hard time believing your claim.
  
  5 replies →
- stouset 1 day ago
  
  Is golang better than Java? Sure, fine, maybe. I'm not a Java expert so I don't have a dog in the race.
  Should and could golang have been so much better than it is? Would golang have been better if Pike and co. had considered use-cases outside of Google, or looked outward for inspiration even just a little? Unambiguously yes, and none of the changes would have needed it to sacrifice its priorities of language simplicity, compilation speed, etc.
  It is absolutely okay to feel that go is a better language than some of its predecessors while at the same time being utterly frustrated at the the very low-hanging, comparatively obvious, missed opportunities for it to have been drastically better.
yehyehboi 7 hours ago

[flagged]

herbstein 1 day ago

Much more egregious is the fact that the API allows returning both an error and a valid file handle. That may be documented to not happen. But look at the Read method instead. It will return both errors and a length you need to handle at the same time.

nasretdinov 1 day ago
The Read() method is certainly an exception rather than a rule. The common convention is to return nil value upon encountering an error unless there's real value in returning both, e.g. for a partial read that failed in the end but produced some non-empty result nevertheless. It's a rare occasion, yes, but if you absolutely have to handle this case you can. Otherwise you typically ignore the result if err!=nil. It's a mess, true, but real world is also quite messy unfortunately, and Go acknowledges that
- stouset 1 day ago
  
  Go doesn't acknowledge that. It punts.
  Most of the time if there's a result, there's no error. If there's an error, there's no result. But don't forget to check every time! And make sure you don't make a mistake when you're checking and accidentally use the value anyway, because even though it's technically meaningless it's still nominally a meaningful value since zero values are supposed to be meaningful.
  Oh and make sure to double-check the docs, because the language can't let you know about the cases where both returns are meaningful.
  The real world is messy. And golang doesn't give you advance warning on where the messes are, makes no effort to prevent you from stumbling into them, and stands next to you constantly criticizing you while you clean them up by yourself. "You aren't using that variable any more, clean that up too." "There's no new variables now, so use `err =` instead of `err :=`."

koakuma-chan 1 day ago

> What if the file name is not valid UTF-8

Nothing? Neither Go nor the OS require file names to be UTF-8, I believe

zimpenfish 1 day ago
> Nothing?
It breaks. Which is weird because you can create a string which isn't valid UTF-8 (eg "\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98") and print it out with no trouble; you just can't pass it to e.g. `os.Create` or `os.Open`.
(Bash and a variety of other utils will also complain about it being valid UTF-8; neovim won't save a file under that name; etc.)
- yencabulator 1 day ago
  
  That sounds like your kernel refusing to create that file, nothing to do with Go.
  $ cat main.go package main import ( "log" "os" ) func main() { f, err := os.Create("\xbd\xb2\x3d\xbc\x20\xe2\x8c\x98") if err != nil { log.Fatalf("create: %v", err) } _ = f } $ go run . $ ls -1 ''$'\275\262''='$'\274'' ⌘' go.mod main.go
  
  10 replies →
- kragen 1 day ago
  
  It sounds like you found a bug in your filesystem, not in Golang's API, because you totally can pass that string to those functions and open the file successfully.
johncolanduoni 1 day ago
Well, Windows is an odd beast when 8-bit file names are used. If done naively, you can’t express all valid filenames with even broken UTF-8 and non-valid-Unicode filenames cannot be encoded to UTF-8 without loss or some weird convention.
You can do something like WTF-8 (not a misspelling, alas) to make it bidirectional. Rust does this under the hood but doesn’t expose the internal representation.
- jstimpfle 1 day ago
  
  What do you mean by "when 8-bit filenames are used"? Do you mean the -A APIs, like CreateFileA()? Those do not take UTF-8, mind you -- unless you are using a relatively recent version of Windows that allows you to run your process with a UTF-8 codepage.
  In general, Windows filenames are Unicode and you can always express those filenames by using the -W APIs (like CreateFileW()).
  
  2 replies →
- andyferris 1 day ago
  
  I believe the same is true on linux, which only cares about 0x2f bytes (i.e. /)
  
  3 replies →

nasretdinov 1 day ago

Note that Go strings can be invalid UTF-8, they dropped panicking on encountering an invalid UTF string before 1.0 I think

xyzzyz 1 day ago
This also epitomizes the issue. What's the point of having `string` type at all, if it doesn't allow you to make any extra assumptions about the contents beyond `[]byte`? The answer is that they planned to make conversion to `string` error out when it's invalid UTF-8, and then assume that `string`s are valid UTF-8, but then it caused problems elsewhere, so they dropped it for immediate practical convenience.
- tialaramex 1 day ago
  
  Rust apparently got relatively close to not having &str as a primitive type and instead only providing a library alias to &[u8] when Rust 1.0 shipped.
  Score another for Rust's Safety Culture. It would be convenient to just have &str as an alias for &[u8] but if that mistake had been allowed all the safety checking that Rust now does centrally has to be owned by every single user forever. Instead of a few dozen checks overseen by experts there'd be myriad sprinkled across every project and always ready to bite you.
  
  13 replies →
- 0x000xca0xfe 1 day ago
  
  Why not use utf8.ValidString in the places it is needed? Why burden one of the most basic data types with highly specific format checks?
  It's far better to get some � when working with messy data instead of applications refusing to work and erroring out left and right.
  
  11 replies →
- roncesvalles 1 day ago
  
  I've always thought the point of the string type was for indexing. One index of a string is always one character, but characters are sometimes composed of multiple bytes.
  
  2 replies →
- assbuttbuttass 1 day ago
  
  string is just an immutable []byte. It's actually one of my favorite things about Go that strings can contain invalid utf-8, so you don't end up with the Rust mess of String vs OSString vs PathBuf vs Vec<u8>. It's all just string
  
  33 replies →
- naikrovek 1 day ago
  
  I think maybe you've forgotten about the rune type. Rune does make assumptions.
  []Rune is for sequences of UTF characters. rune is an alias for int32. string, I think, is an alias for []byte.
  
  1 reply →

silverwind 1 day ago

> What if the file name is not valid UTF-8, though

They could support passing filename as `string | []byte`. But wait, go does not even have union types.

lblume 1 day ago

But []byte, or a wrapper like Path, is enough, if strings are easily convertible into it. Rust does it that way via the AsRef<T> trait.

kragen 1 day ago

If the filename is not valid UTF-8, Golang can still open the file without a problem, as long as your filesystem doesn't attempt to be clever. Linux ext4fs and Go both consider filenames to be binary strings except that they cannot contain NULs.

This is one of the minor errors in the post.

ants_everywhere 1 day ago

> they stuck to the practical convenience of solving the problem that they had in front of them, quickly, instead of analyzing the problem from the first principles, and solving the problem correctly (or using a solution that was Not Invented Here).

I've said this before, but much of Go's design looks like it's imitating the C++ style at Google. The comments where I see people saying they like something about Go it's often an idiom that showed up first in the C++ macros or tooling.

I used to check this before I left Google, and I'm sure it's becoming less true over time. But to me it looks like the idea of Go was basically "what if we created a Python-like compiled language that was easier to onboard than C++ but which still had our C++ ergonomics?"

shrubble 1 day ago
Didn’t Go come out of a language that was written for Plan9, thus pre-dating Rob Pike’s work at Google?
- pjmlp 10 hours ago
  
  Kind of, Limbo, written for Inferno, taking into consideration what made Alef's design for Plan 9 a failure, like not having garbage collection.
- kragen 1 day ago
  
  Yes, Golang is superficially almost identical to Pike's Newsqueak.
  
  2 replies →
- ants_everywhere 1 day ago
  
  not that I recall but I may not be recalling correctly.
  But certainly, anyone will bring their previous experience to the project, so there must be some Plan9 influence in there somewhere
  
  9 replies →

perryizgr8 17 hours ago

> What if the file name is not valid UTF-8, though?

Then make it valid UTF-8. If you try to solve the long tail of issues in a commonly used function of the library its going to cause a lot of pain. This approach is better. If someone has a weird problem like file names with invalid characters, they can solve it themselves, even publish a package. Why complicate 100% of uses for solving 0.01% of issues?

nomel 17 hours ago

> Then make it valid UTF-8.
I think you misunderstand. How do you do that for a file that exists on disk that's trying to be read? Rename it for them? They may not like that.