Comment by westurner

1 month ago

From "Insecurity and Python Pickles" (2024) https://news.ycombinator.com/item?id=39685128 :

> There should be a data-only pickle serialization protocol (that won't serialize or deserialize code).

> How much work would it be to create a pickle protocol that does not exec or eval code?

"Title: Pickle protocol version 6: skipcode pickles" https://discuss.python.org/t/create-a-new-pickle-protocol-ve...

I have to agree with Chris Angelico there:

> Then the obvious question is: Why? Why use pickle? The most likely answer is “because <X> can’t represent what I need to transmit”, but for that to be at all useful to your proposal, you need to show examples that won’t work in well-known safe serializers.

  • Code in packages should be signed.

    Code in pickles should also be signed.

    I have no need for the pickle module now, but years ago thought there might have been safer way to read data that was already in pickles.

    For backwards compatibility, skipcode=False must be the default,

    were someone to implement a pickle str parser that doesn't eval code.

    JS/ES/TS Map doesn't map to JSON.

  • Pickle still is good for custom objects (JSON loses methods and also order), Graphs & circular refs (JSON breaks), Functions & lambdas (Essential for ML & distributed systems) and is provided out of box.

    • We're contemplating protocols that don't evaluate or run code; that rules out serializing functions or lambdas (i.e., code).

      Custom objects in Python don't have "order" unless they're using `__slots__` - in which case the application already knows what they are from its own class definition. Similarly, methods don't need to be serialized.

      A general graph is isomorphic to a sequence of nodes plus a sequence of vertex definitions. You only need your own lightweight protocol on top.

      1 reply →