Go(lang): Robust generic functions on slices

2 years ago (go.dev)

I feel like I’m taking crazy pills. How are these APIs even remotely defensible?

    slices.Sort(s)          // fine
    slices.Compact(s)       // broken
    s  = slices.Compact(s)  // fine
    s := slices.Compact(s)  // broken (!!!)
    slices.Delete(s, …)     // broken
    s = slices.Delete(s, …) // fine

How is one intended to remember which functions require overwriting (due to invalidating) their input and which don’t? Why does the language make it so easy to render function parameters unusable upon return but impossible to enforce that they aren’t used afterward?

How on earth did it take twelve years for this “simple” language to make a function to delete elements from an array, with `s = append(s[:start], s[end:]...)` having to suffice until then? How on earth does the “better” API twelve years later have such a gaping footgun? This is a core type that quite literally every program uses. How have things gone so far off the rails that “setting the obsolete pointers to nil” is such an intractable problem for end users they had to add a new keyword to the language?

For other languages I see posts where weird language corner cases bring up challenging issues that really reinforce the idea that language design is hard. Rust—for example—has unsoundness issues in corner cases of the type system. But for go, it feels like there’s a constant stream of own goals on core areas of the language where the design should have knocked it out of the park. “Simple manipulation of arrays” just should not have this many footguns.

  • It's pretty simple really: Go's slice API was compromised from the start by making it the unholy hybrid of a list/vector and an actual slice, it's one of the original sins of the language. And it's not really fixable.

    It must be said that even splitting them properly you'd have issues as long as you mix slices and mutability (e.g. take a full slice out of a vector then delete() on the vector, the slice will see the hole). Though the issues would be less prominent.

    • The way it is listed, it definitely looks problematic.

      I had to dig a little and in fact, once we remember that a slice is a view into an array and that "some" of these methods return slices, it's actually fine.

      The only issue is perhaps s:=slices.Compact() But that's not even because of the API. It's because of the short variable assignment that may allow shadowing, unfortunately.

      The issue is probably overblown here.

      To be even more thorough, I've had the pleasure (lol) to implement a wrapper to have some form of immutable slices so I can say that it is mitigable. Not pleasant but mitigable. (I needed to compare values of a slice stored in a map, value before retrieval from map vs. value when it had been inserted back into the map, so having aliasing was problematic since the value in the map would be updated at the same time the value retrieved was (obviously, duh!) , had to implement a copy-on-write variant).

      :)

      4 replies →

  • Without defending this API, the easiest way to go about avoiding bugs when working with slice mutating functions is to consider all those "fine" scenarios as not fine.

    Always assume that only the return value of slice mutating functions are the valid reference and the old one always invalid. This is not always completely accurate, but it is very useful in that, it is also never "wrong".

    • > Without defending this API, the easiest way to go about avoiding bugs when working with slice mutating functions is to consider all those "fine" scenarios as not fine. Always assume that only the return value of slice mutating functions are the valid reference and the old one always invalid.

      The first "fine" scenario is fine because `slices.Sort` works in place, it doesn't return a value at all.

      And the other "fine" versions do essentially what you advocate, by overwriting the invalid reference with the new one.

      3 replies →

  • Possibly that's mostly out of familiarity with the language? The only thing in your example that does things in-place (and thus looks out-of-place) is Sort(), but that's the way I'd at least expect it to work? If you take that away from the list all of them behave similarly to each other and return the modified slice:

        slices.Compact(s)       // modified slice in the return value is ignored
        s  = slices.Compact(s)  // s now points to the new changed slice
        s := slices.Compact(s)  // s already exists, this is not valid Go syntax.
        slices.Delete(s, …)     // modified slice in the return value is ignored
        s = slices.Delete(s, …) // s now points to the new changed slice
    
    

    EDIT: Would prefer people not to downvote actual discussion. In this case there were was indeed good argument made on the reply that these also modify the underlying slice, but it's not like I was being rude in the comment.

    • > The only thing in your example that does things in-place (and thus looks out-of-place)

      That is not really correct, and is much of the issue: Compact and Delete operate in-place on the backing buffer while copying the slice metadata. This is the same issue as the append() builtin and why it's so fraught (before you get in all the liveness stuff).

      > s := slices.Compact(s) // s already exists, this is not valid Go syntax.

          s := []int{}
          if true {
              s := slices.Compact(s)
              _ = s
          }

      1 reply →

  • I recently wanted to split and iterate on a String (`char *`) in the Git project. I was not feeling good that it had come to this point since I'm not a C programmer. ChatGPT told me that I wanted `strtok`. So I tried that but then the compiler complained that it was banned (via a macro).[1]

    Looking at the commit log I found out that `string_list.h` in the project was something that I could use. There's functions and initializers for duplicating and not duplicating the string. So I chose `STRING_LIST_INIT_DUP` and used `string_list_split` on `\n`.[2] Then I could use a regular for-loop since the struct has `nr` which is the count. And then it just worked.

    I was pleasantly surprised that there was such a nice library for "list of strings" in C.[2] And the price you pay compared to more modern languages is an extra allocation (or one per sub-string?). And although I have only used it for that one thing I didn't run into any footguns.

    Languages designed way after C (like in this century) should keep in mind what they want to give the user the ability to do: just make simple immutable types (like `String` in Java (of course there's `StringBuilder)), give immutable/mutable pairs, and also if they want to give access to no-allocation slices (like Rust). And if they go all the way with this they really need to give a large enough API surface so that all of these different concerns don't get mixed up. And so that users don't have to think about the underlying arrays all the time in order to not do something unintentional.

    [1] Apparently `strtok` is not nice to use because it can change the string you pass to it or something. I didn't read a lot about it.

    [2] The documentation tells you when you should be using dup/no-dup. And it makes sense to duplicate/allocate in this case, I think. Intuitively to me it seems that I'm paying for an allocation in order to not have to mess up per-char iteration at all. And that was fine for my use of it.

    [3] And that the macro stopped me from going down the wrong path with `strtok`!

    • I haven’t used strtok in a long time but my recollection is that it mutates the original string by placing a NULL value at the next delimiter so “hello world” would become “hello<0x00>world” if splitting on spaces. This lets you loop with the same string passed to strtok until it is done.

      It’s ugly. Would not recommend. 0/10

  • > slices.Compact(s) // broken

    I don't quite get what you mean by 'broken' here. You know that the length of a slice can't be altered by passing it to a function. So clearly s will still contain the same number of elements as it did before the call to Compact. Similarly for Delete.

    You can ignore the return value of the function and that might introduce a bug in your code. But that's something that could happen with all kinds of functions.

The problems with the API they point out are almost all things that rust's ownership system was built to solve.

Things like:

    slices.Sort(s) // correct
    slices.Compact(s) // incorrect
    slices.Delete(s, ...) // incorrect
    s := slices.Delete(s, ...) // incorrect if 's' is referenced again in the outer scope
    s = slices.Delete(s, ...) // correct

All of those are solved by having functions like 'slices.Sort' take a '&mut' reference in rust speak, and having 'slices.Compact' and 'Delete' take an owned slice, and return a new owned slice.

  • IMHO, all these comes from the million dollar mistake that Go made from the very beginning, with slices being a bastard between both an owned and non-owned data container. Indeed, any function accepting a slice may simply use it as an immutable view on whatever data it needs, or modify it, potentially re-alloc it, and wreak havoc by invalidating all previous references to it and/or its elements. And God forbid you passed a subslice to such a function, you're in for a nasty surprise.

    Even without going the whole 9 yards of the Rust memory model, something like the views in C++, or the java {Array,Linked}Lists containing references to objects and thus being resistant to re-alloc, or even plainly immutable list like in OCaml are all working and simple enough solutions to the issue.

    I still can't wrap my mind around how supposedly experienced people like Pike & co. may have designed slices, then went ‶yep, that makes perfect sense for a robust, basic, widely used data structure to create our language around″, outside of a few toy programs.

    • my feeling with slices is that go wanted to really make programmers mindful of the cost of allocating new space on array operations.

      By having this weird API on slices, they force you to be explicit on allocating new memory.

      5 replies →

  • You don't even need a notion of ownership. A distinction between an immutable and mutable slice should be enough, because an immutable slice can never be changed which implies that its excess capacity (if any) can't be exploited for optimization.

    • You'd need three types: immutable slice of an immutable array, mutable slice of a mutable array (ie basically a reference to a mutable array), and an immutable slice of a mutable array.

  • Long before that, the problems with the API are solved by having a separate heap type for length manipulations rather than confusing the interfaces.

    If Delete or Compact are only available there and it’s modified in place, the problems don’t arise in the first place.

A missing tidbit that may help contextualize this post: One of the things about Go that surprised me is that if you have a slice which does not represent the full capacity of the underlying array, you can go ahead and reslice it up to that full capacity even though it's a panic to access the things you're reslicing directly: https://go.dev/play/p/oThz2bNFwgr

Consequently, the GC has to assume that anything forward of any given slice into the underlying array may become accessible in the future as there is legal Go that can access it. It's still memory safe, but it surprised me.

I had some code that was using my incorrect belief that slices could not be resliced up in size to implement some light security boundaries. Fortunately it was still legal, because the code in question simply didn't slice things larger and it's not like I was allowing arbitrary user-supplied code to run, so it was still correct in what it was doing. But I was expecting the runtime to scream if I did somehow screw it up when in fact it may not, depending on the exact capacity and what happened when.

It's also asymmetric, as far as I know; you can slice forward into the array if there is capacity, but if you've got a slice that starts after index 0 in the backing array you can't use that slice to walk back into the underlying array. That is, with

     s := []int{11, 12, 13, 14, 15}
     s = s[2:]

as far as I know, 11 and 12 are no longer accessible to any legal Go code (not using "unsafe") after that second line executes.

Corrections (again not involving "unsafe", it's obvious those two values are still accessible through "unsafe") welcome. I was wrong once, it's easy to believe I could be wrong again.

  • You can prevent later longer reslicing by add an additional cap() element to the slicing to shrink the capacity.

      s = s[:3:3]
    

    from your first example link.

  • In your playground example, if you print the capacity and the length before and after “re-extension”, it becomes clear what happened. In fact, accessing item 5 after reduction gives a size panic, where as accessing item 6 after re-extension gives you a capacity panic.

    Understanding rsc’s “Go Slices” blog is very helpful here. Coming from Java or something, this exposure of underlying storage could be jarring, but coming from C, Go slices are basically built in fat arrays, and this behavior doesn’t surprise me. Maybe it was a design mistake to expose so much of the underlying machinery. Ymmv.

  • It is reinforcing my belief that language design is hard. Go is supposed to be a simple language, one may write it for long time, but there will be some trap you discover once in a while.

  • > ... the GC has to assume that anything forward of any given slice into the underlying array may become accessible in the future as there is legal Go that can access it.

    This property complicates the semantics of slices.Delete() in an interesting way:

      It is out of scope of the current proposal to write anything to the original tail
    

    https://github.com/golang/go/issues/63393

  • interesting catch, thanks for sharing. my poor memory is already racing thinking about cases where i may have left traps with the same assumption

> func Index[S ~[]E, E comparable](s S, v E) int {

After seeing this signature, I think that Go is giving up on it's simpleness principle.

  • Generics can be a bit of an eye sore, but go already had reflection & I recently was mucking around bigquery's internals full of `reflect` having to read through that, & it doesn't even get backed by a type checker

    `func Index(s interface{}, v interface{}) int` both has to deal with incompatible types, & the function body is going to be anything but simple

    (sure, without generics most people wouldn't even write that `func Index`, but ultimately there's plenty of libraries deep in reflect that'd greatly benefit from generics)

    I've also been dealing with temporal.io using interface{} everywhere, got to a point where I wrote typed wrappers using generics to have type checking for signals

  • I was quite upset that they introduced generics. It is slow marching into C++ like look and feel and all the eyesore that it entails.

    • In C# I feel you needed generics because there was no resizable array without casting from object (the Pareto 80-20 use case) and later async and Task<T> but I don’t think these problems apply to Go so it could have done without it (maybe!). Non userspace generics may be where it is at to ease suffering in places.

      As a language design though Elm remains remarkably simple with parametric polymorphism (aka Generics) but it needs other design choices to do so. Elm is the Go of Haskell :-)

From the article:

> Several new functions (Insert, Replace, Delete, etc.) modify the slice. To understand how they work, and how to properly use them, we need to examine the underlying structure of slices. [My emphasis]

What this is telling us, although perhaps the author doesn't appreciate, is that this isn't an abstraction. If it was an abstraction we don't need to "examine the underlying structure of slices" in order to properly use them, we can just treat them as opaque. But in Go that's not an option, intimate knowledge of the inner workings is crucial to correct programming in Go.

And that's not necessarily a problem for its main authors, who have anyway got such an understanding, but it should have given pause to somebody who thought about how you can effectively teach others to use this language. For Go in particular that's a huge problem because it's intended as a language for software engineering and for that purpose you do want to rely heavily on abstraction.

I loved C++ for years, but the more I learned about it the less I wanted to deal with it anymore. The past year or two, Go increasingly tries to repeat that experience. Should have just stuck with Common Lisp I guess.