Comment by westurner
1 month ago
From "Insecurity and Python Pickles" (2024) https://news.ycombinator.com/item?id=39685128 :
> There should be a data-only pickle serialization protocol (that won't serialize or deserialize code).
> How much work would it be to create a pickle protocol that does not exec or eval code?
"Title: Pickle protocol version 6: skipcode pickles" https://discuss.python.org/t/create-a-new-pickle-protocol-ve...
I have to agree with Chris Angelico there:
> Then the obvious question is: Why? Why use pickle? The most likely answer is “because <X> can’t represent what I need to transmit”, but for that to be at all useful to your proposal, you need to show examples that won’t work in well-known safe serializers.
Code in packages should be signed.
Code in pickles should also be signed.
I have no need for the pickle module now, but years ago thought there might have been safer way to read data that was already in pickles.
For backwards compatibility, skipcode=False must be the default,
were someone to implement a pickle str parser that doesn't eval code.
JS/ES/TS Map doesn't map to JSON.
Pickle still is good for custom objects (JSON loses methods and also order), Graphs & circular refs (JSON breaks), Functions & lambdas (Essential for ML & distributed systems) and is provided out of box.
We're contemplating protocols that don't evaluate or run code; that rules out serializing functions or lambdas (i.e., code).
Custom objects in Python don't have "order" unless they're using `__slots__` - in which case the application already knows what they are from its own class definition. Similarly, methods don't need to be serialized.
A general graph is isomorphic to a sequence of nodes plus a sequence of vertex definitions. You only need your own lightweight protocol on top.
1 reply →