Comment by stouset

1 year ago

I feel like I’m taking crazy pills. How are these APIs even remotely defensible?

    slices.Sort(s)          // fine
    slices.Compact(s)       // broken
    s  = slices.Compact(s)  // fine
    s := slices.Compact(s)  // broken (!!!)
    slices.Delete(s, …)     // broken
    s = slices.Delete(s, …) // fine

How is one intended to remember which functions require overwriting (due to invalidating) their input and which don’t? Why does the language make it so easy to render function parameters unusable upon return but impossible to enforce that they aren’t used afterward?

How on earth did it take twelve years for this “simple” language to make a function to delete elements from an array, with `s = append(s[:start], s[end:]...)` having to suffice until then? How on earth does the “better” API twelve years later have such a gaping footgun? This is a core type that quite literally every program uses. How have things gone so far off the rails that “setting the obsolete pointers to nil” is such an intractable problem for end users they had to add a new keyword to the language?

For other languages I see posts where weird language corner cases bring up challenging issues that really reinforce the idea that language design is hard. Rust—for example—has unsoundness issues in corner cases of the type system. But for go, it feels like there’s a constant stream of own goals on core areas of the language where the design should have knocked it out of the park. “Simple manipulation of arrays” just should not have this many footguns.

It's pretty simple really: Go's slice API was compromised from the start by making it the unholy hybrid of a list/vector and an actual slice, it's one of the original sins of the language. And it's not really fixable.

It must be said that even splitting them properly you'd have issues as long as you mix slices and mutability (e.g. take a full slice out of a vector then delete() on the vector, the slice will see the hole). Though the issues would be less prominent.

  • The way it is listed, it definitely looks problematic.

    I had to dig a little and in fact, once we remember that a slice is a view into an array and that "some" of these methods return slices, it's actually fine.

    The only issue is perhaps s:=slices.Compact() But that's not even because of the API. It's because of the short variable assignment that may allow shadowing, unfortunately.

    The issue is probably overblown here.

    To be even more thorough, I've had the pleasure (lol) to implement a wrapper to have some form of immutable slices so I can say that it is mitigable. Not pleasant but mitigable. (I needed to compare values of a slice stored in a map, value before retrieval from map vs. value when it had been inserted back into the map, so having aliasing was problematic since the value in the map would be updated at the same time the value retrieved was (obviously, duh!) , had to implement a copy-on-write variant).

    :)

    • > once we remember that a slice is a view into an array and that "some" of these methods return slices, it's actually fine.

      Only in the meme sense, it does not actually solve or help solve the problem in any way.

      3 replies →

Without defending this API, the easiest way to go about avoiding bugs when working with slice mutating functions is to consider all those "fine" scenarios as not fine.

Always assume that only the return value of slice mutating functions are the valid reference and the old one always invalid. This is not always completely accurate, but it is very useful in that, it is also never "wrong".

  • > Without defending this API, the easiest way to go about avoiding bugs when working with slice mutating functions is to consider all those "fine" scenarios as not fine. Always assume that only the return value of slice mutating functions are the valid reference and the old one always invalid.

    The first "fine" scenario is fine because `slices.Sort` works in place, it doesn't return a value at all.

    And the other "fine" versions do essentially what you advocate, by overwriting the invalid reference with the new one.

    • The nil return assignment is a compile time error, so easy to spot. It is not supposed to be the defence of the API or a silver bullet, just a practical rule that can save you some hustle.

Possibly that's mostly out of familiarity with the language? The only thing in your example that does things in-place (and thus looks out-of-place) is Sort(), but that's the way I'd at least expect it to work? If you take that away from the list all of them behave similarly to each other and return the modified slice:

    slices.Compact(s)       // modified slice in the return value is ignored
    s  = slices.Compact(s)  // s now points to the new changed slice
    s := slices.Compact(s)  // s already exists, this is not valid Go syntax.
    slices.Delete(s, …)     // modified slice in the return value is ignored
    s = slices.Delete(s, …) // s now points to the new changed slice

EDIT: Would prefer people not to downvote actual discussion. In this case there were was indeed good argument made on the reply that these also modify the underlying slice, but it's not like I was being rude in the comment.

  • > The only thing in your example that does things in-place (and thus looks out-of-place)

    That is not really correct, and is much of the issue: Compact and Delete operate in-place on the backing buffer while copying the slice metadata. This is the same issue as the append() builtin and why it's so fraught (before you get in all the liveness stuff).

    > s := slices.Compact(s) // s already exists, this is not valid Go syntax.

        s := []int{}
        if true {
            s := slices.Compact(s)
            _ = s
        }

    • Ah, my bad for not reading the article before the comments :)

      As that's the case it's indeed hard to defend it. Data structure-wise it still kind of makes sense since as you mentioned the slice metadata is changed, but it basically making the old slice invalid is rather annoying.

      For the := example sure, it's a bit far fetched example and likely would not pass any code review, but there are cases where shadowing is indeed valid. So is the `s := slices.Compact(s)` in this example not working as expected then?

      EDIT: looking at another reply to the parent the := being broken likely is trying to point that using := also modifies the slice and thus the original slice is broken when one tries to use it in the original scope. That's really bad practice, but true indeed.

I recently wanted to split and iterate on a String (`char *`) in the Git project. I was not feeling good that it had come to this point since I'm not a C programmer. ChatGPT told me that I wanted `strtok`. So I tried that but then the compiler complained that it was banned (via a macro).[1]

Looking at the commit log I found out that `string_list.h` in the project was something that I could use. There's functions and initializers for duplicating and not duplicating the string. So I chose `STRING_LIST_INIT_DUP` and used `string_list_split` on `\n`.[2] Then I could use a regular for-loop since the struct has `nr` which is the count. And then it just worked.

I was pleasantly surprised that there was such a nice library for "list of strings" in C.[2] And the price you pay compared to more modern languages is an extra allocation (or one per sub-string?). And although I have only used it for that one thing I didn't run into any footguns.

Languages designed way after C (like in this century) should keep in mind what they want to give the user the ability to do: just make simple immutable types (like `String` in Java (of course there's `StringBuilder)), give immutable/mutable pairs, and also if they want to give access to no-allocation slices (like Rust). And if they go all the way with this they really need to give a large enough API surface so that all of these different concerns don't get mixed up. And so that users don't have to think about the underlying arrays all the time in order to not do something unintentional.

[1] Apparently `strtok` is not nice to use because it can change the string you pass to it or something. I didn't read a lot about it.

[2] The documentation tells you when you should be using dup/no-dup. And it makes sense to duplicate/allocate in this case, I think. Intuitively to me it seems that I'm paying for an allocation in order to not have to mess up per-char iteration at all. And that was fine for my use of it.

[3] And that the macro stopped me from going down the wrong path with `strtok`!

  • I haven’t used strtok in a long time but my recollection is that it mutates the original string by placing a NULL value at the next delimiter so “hello world” would become “hello<0x00>world” if splitting on spaces. This lets you loop with the same string passed to strtok until it is done.

    It’s ugly. Would not recommend. 0/10

> slices.Compact(s) // broken

I don't quite get what you mean by 'broken' here. You know that the length of a slice can't be altered by passing it to a function. So clearly s will still contain the same number of elements as it did before the call to Compact. Similarly for Delete.

You can ignore the return value of the function and that might introduce a bug in your code. But that's something that could happen with all kinds of functions.