← Back to context

Comment by mstange

13 hours ago

Are there any tools that help finding these kinds of things? Like a profiler that says "80% of the allocated bytes are objects of this type, with 95% of those having that field set to None"

It would be super useful since I think this is pretty likely to be surprising to many users. But the profiler would need to be a particularly-specific refinement of even that: you need to make it obvious that it's not "95% of your Option<Thing>s are None, and your Option<Things> are using X bytes", but that "95% of the bytes used for your Option<Thing>s are used for None versions." Otherwise you could just assume that your non-None ones are just that chunky, or you have that many of them... I haven't seen a profiler with that level of insight, unfortunately.

Perhaps because this feels like a fairly rust-specific gotcha. Especially if you're coming from languages where there's often not much syntactical distinction made between "this is a pointer because I don't want to be copying it" and "this is a pointer because it's optional."

For instance, it's not until now that I actually understood what the sibling comment about the Enum type size discrepancy lint meant: "This lint obviously cannot take the distribution of variants in your running program into account. It is possible that the smaller variants make up less than 1% of all instances, in which case the overhead is negligible and the boxing is counter-productive. Always measure the change this lint suggests." I had always accidentally read this backwards, thinking it meant something more to the effect of "if most of the instances are actually small, then it's not a problem here, but be aware that some of them are much larger so some of your calls to things with this could end up passing much larger types."

  • "You have 400 megabytes of zeros in <this type>" is probably a pretty easy heuristic to add.

    • That may be surprisingly difficult in Rust. We generally think of Option<T> using O to represent None. However, it can actually use any invalid value of T

      4 replies →

The closest I am aware of is clippy (`cargo clippy` in a standard Rust project will run it with default configurations).

Clippy is essentially a linter; and one of its checks catches cases where different enum variants have a significantly different size; with a suggestion to Box the larger variant.

Since this is just a linter, it doesn't actually have any knowledge of how frequently each variant is actually used. It also doesn't address the situation in the article at all.

I've personally found heaptrack[1] pretty good for this task, very straightforward to use and the info is detailed enough. Though, it'll only tell you where they are happening (e.j. allocation rate for Box::new), but not exactly what type they are given that info isn't available at runtime. Usually that kind of thing would be reserved to GC-based languages where they keep track of counts for each object.

1: https://github.com/kde/heaptrack