PyCon Lithuania 2024

The Ghosts of Distant Objects
2024-04-04 , Room 111

Sometimes you have a Python object and you want it somewhere else: maybe you want to save your data to disk and load it again tomorrow; or you want to send some complex parameters over the network.
I'll talk about pickle - the usual way to do this, including ways it can go wrong, how to extend it, compare it to other approaches like JSON or storing in a database; and I'll stick a little bit of theory in my talk too.


This talk will be about getting objects from one Python process to another. It is motivated by my experiences doing exactly that inside the Parsl parallel scripting library.

It will be centered around pickle, the standard Python serialization library, but is intended to give a broader view of what it means to get an object from one place to another, so this shouldn't be regarded as "just a pickle tutorial".

First, I'll give an actual one line pickle tutorial.

Then, I will introduce a few other methods such as JSON and SQL, which I'll use for comparison throughout the talk.
I'll give some clear reasons why you might (or might not) want to use these very different methods.

I'll talk about what kind of object is hard to serialize and some of the ways that pickle can help: for example objects with cycles, and objects that don't represent "values".

I'll show how to expand pickle to understand new classes, or to handle existing classes more efficiently; and existing libraries you can install as "pickle expansion packs"; and this will lead onto why you shouldn't unpickle untrusted data.

I'll briefly touch on some complexities like data format versioning, and how that might motivate using your own explicitly designed format rather than letting Python do things for you.

I'll talk about Python multiprocessing, which needs to move objects between processes internally in a few different ways: some of which can be quite surprising and cause hangs or performance problems.

I'll mention a little bit of theory about what it means for a deserialized object to be the same as the object you serialized, what it means to pickle a function, and then some other techniques this view unlocks, such as lazy proxying and remote methods.

In the end I hope you'll have a better understand of what it means to move objects around, why you might pick different approaches, and what is happening when you try to debug strange problems.

Ben has worked as a programmer mostly in the fields of high performance computing and functional programming. He's mostly doing Python these days, but he's been paid for at least Haskell, FORTRAN and PHP. He's especially interested in bringing ideas from one language into another.