Pickling

Recall this bit of code from writing coordinates to a file,

for coord in coords:
    f.write(str(coord[0])+' '+str(coord[1])+'\n')

It's straightforward enough, but it feels like busywork converting numbers to strings and bundling the strings together into lines. In fact it feels like such straightforward busywork that our intuitions suggest it could be automated, and in fact Python provides a module that automatically converts its built-in types to strings that can be stored in text files. The module is whimsically called pickle (because it preserves the objects the way pickling preserves foods). Its use is straightforward:

# life.py
universe = [ [0, 0, 0, 0, 0, 0, 0, 0],
             [0, 0, 0, 0, 0, 0, 0, 0],
             [0, 0, 0, 1, 0, 0, 0, 0],
             [0, 0, 0, 0, 1, 0, 0, 0],
             [0, 0, 1, 1, 1, 0, 0, 0],
             [0, 0, 0, 0, 0, 0, 0, 0],
             [0, 0, 0, 0, 0, 0, 0, 0],
             [0, 0, 0, 0, 0, 0, 0, 0]
           ]
import pickle
f = open( 'pickled_universe.txt', 'wb') # we use wb to indicate we are writing bytes
pickle.dump(universe, f)
f.close()
f = open('pickled_universe.txt', 'rb') # we use rb to indicate we are reading bytes
u = pickle.load(f)
f.close()
print(u)

Output:

>>> 
[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0],
 [0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 1, 1, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]
>>> 

Almost any Python object can be pickled (among the exceptions are odd ducks like sockets for network connections and file handles). Given how compact it is why would we not just always use it? The answer is that we will often use it, but not without thinking first. One reason we will sometimes avoid it is that the pickled representation is neither particularly compact nor particularly readable. pickled_universe.txt above looks like this,

(lp0
(lp1
I0
aI0
aI0
aI0
aI0
aI0
aI0
aI0
aa(lp2
I0
aI0
aI0
aI0
aI0
aI0
aI0
aI0
aa(lp3
I0
aI0
aI0
aI1
...

and is 383 bytes in size. That makes it 2 to 3 times as large as our hand-rolled solutions in Options 1 and 2.

A more subtle reason is that it can be inefficient for some data types. One common situation is to want to store a dictionary of objects to disk. The dictionary can be pickled, but then to access any individual element of the dictionary the entire dictionary must be read into memory and unpickled before the element can be accessed. If the dictionary is large this can represent a significant amount of processing time and memory. Because this use case is so common Python provides the shelve module in its standard library. A shelve is like a dictionary that is stored on disk.