Comment by zahlman

1 month ago

> Yes, that's what makes it not zero-copy.

Yeah, so you'd have to pass around the `BytesIO` instead.

I know that zero-copy doesn't ordinarily mean what I described, but that seemed to be how TFA was using it, based on the logic in the rest of the sentence.

3 comments

zahlman

woodruffw 1 month ago

> Yeah, so you'd have to pass around the `BytesIO` instead.

That wouldn’t be zero-copy either: BytesIO is an I/O abstraction over a buffer, so it intentionally masks the “lifetime” of the original buffer. In effect, reading from the BytesIO creates new copies of the underlying data by design, in new `bytes` objects.

(This is actually a great capsule example of why zero-copy design is difficult in Python: the Pythonic thing to do is to make lots of bytes/string/rich objects as you parse, each of which owns its data, which in turn means copies everywhere.)

zahlman 1 month ago
Fair. (You can `.getbuffer` but you still have to keep the underlying BytesIO object "open" somehow.)
I'm not convinced this is going to bottleneck things, though.
(On the flip side, I guess the OS is likely to cache any disk write in memory anyway.)
- carderne 1 month ago
  
  I’m just a casual observer of this thread, but I think you’d find it worthwhile to read up a bit on zero-copy stuff.
  It’s ~impossible in Python (because you don’t control memory) and hard in C/similar (because of use-after-free).
  Rust’s borrow checker makes it easier, but it’s still tricky (for non-trivial applications). You have to do all your transformations and data movements while only referencing the original data.