← Back to context

Comment by woodruffw

18 hours ago

> Given that "zero-copy" apparently means "in-memory" (a deserialized version of the data necessarily cannot be the same object as the original data), that's not even difficult to do with the Python standard library

This is not what zero-copy means. Here's a working definition[1].

Specifically, it's not just about keeping things in memory; copying in memory is normal. The goal is to not make copies (or more precisely, what Rust would call "clones"), but to instead convey the original representation/views of that representation through the program's lifecycle where feasible.

> a deserialized version of the data necessarily cannot be the same object as the original data

rust-asn1 would be an example of a Rust library that doesn't make any copies of data unless you explicitly ask it to. When you load e.g. a Utf8String[2] in rust-asn1, you get a view into the original input buffer, not an intermediate owning object created from that buffer.

> (That does, of course, copy data around within memory, but.)

Yes, that's what makes it not zero-copy.

[1]: https://rkyv.org/zero-copy-deserialization.html

[2]: https://docs.rs/asn1/latest/asn1/struct.Utf8String.html

> Yes, that's what makes it not zero-copy.

Yeah, so you'd have to pass around the `BytesIO` instead.

I know that zero-copy doesn't ordinarily mean what I described, but that seemed to be how TFA was using it, based on the logic in the rest of the sentence.

  • > Yeah, so you'd have to pass around the `BytesIO` instead.

    That wouldn’t be zero-copy either: BytesIO is an I/O abstraction over a buffer, so it intentionally masks the “lifetime” of the original buffer. In effect, reading from the BytesIO creates new copies of the underlying data by design, in new `bytes` objects.

    (This is actually a great capsule example of why zero-copy design is difficult in Python: the Pythonic thing to do is to make lots of bytes/string/rich objects as you parse, each of which owns its data, which in turn means copies everywhere.)

    • Fair. (You can `.getbuffer` but you still have to keep the underlying BytesIO object "open" somehow.)

      I'm not convinced this is going to bottleneck things, though.

      (On the flip side, I guess the OS is likely to cache any disk write in memory anyway.)

      1 reply →