search
HomeBackend DevelopmentPython TutorialObject serialization and deserialization of Python's built-in pickle library

We recently want to archive the download results obtained by the crawler. This result is a Python object (we don’t want to simply save an HTML or json, we want the entire download process to be restored) , so I thought of using Python’s built-in pickle library (pickled cucumber library) to serialize objects into bytes and deserialize them when needed.

Object serialization and deserialization of Pythons built-in pickle library

You can simply understand the usage and function of pickle through the following code.

In [2]: import pickle 
In [3]: class A:
        pass 
In [4]: a = A()
In [5]: a.foo = 'hello'
In [6]: a.bar = 2
In [7]: pick_ed = pickle.dumps(a) 
In [8]: pick_ed
Out[8]: b'\x80\x03c__main__\nA\nq\x00)\x81q\x01}q\x02(X\x03\x00\x00\x00fooq\x03X\x05\x00\x00\x00helloq\x04X\x03\x00\x00\x00barq\x05K\x02ub.'
In [9]: unpick = pickle.loads(pick_ed) 
In [10]: unpick
Out[10]: <__main__.A at 0x10ae67278>
In [11]: a
Out[11]: <__main__.A at 0x10ae67128>
In [12]: dir(unpick)
Out[12]:
[&#39;__class__&#39;,
&#39;__delattr__&#39;,
&#39;__dict__&#39;,
&#39;__dir__&#39;,
&#39;__doc__&#39;,
&#39;__eq__&#39;,
&#39;__format__&#39;,
&#39;__ge__&#39;,
&#39;__getattribute__&#39;,
&#39;__gt__&#39;,
&#39;__hash__&#39;,
&#39;__init__&#39;,
&#39;__init_subclass__&#39;,
&#39;__le__&#39;,
&#39;__lt__&#39;,
&#39;__module__&#39;,
&#39;__ne__&#39;,
&#39;__new__&#39;,
&#39;__reduce__&#39;,
&#39;__reduce_ex__&#39;,
&#39;__repr__&#39;,
&#39;__setattr__&#39;,
&#39;__sizeof__&#39;,
&#39;__slotnames__&#39;,
&#39;__str__&#39;,
&#39;__subclasshook__&#39;,
&#39;__weakref__&#39;,
&#39;bar&#39;,
&#39;foo&#39;]
In [13]: unpick.foo
Out[13]: &#39;hello&#39;
In [14]: unpick.bar
Out[14]: 2

You can see that the usage of pickle is somewhat similar to json, but there are several fundamental differences:

json is a cross-language universal data exchange format, generally expressed in text, human beings Readable. pickle is used to serialize Python objects, only for Python. The result of serialization is binary data, which is not readable by humans. Moreover, json can only serialize a part of the built-in types by default, and pickle can serialize quite a lot of data.

There is also an ancient marshal that is also built-in. But this library is mainly for .pyc files. Custom types are not supported and are not complete. For example, it cannot handle cyclic applications. If an object refers to itself, the Python interpreter will hang when using marshal.

Version compatibility issues

Since pickle is for Python, Python has different versions (and the difference between 2 and 3 is very big), so it must be taken into consideration Can the serialized object be deserialized by a higher (or lower?) version of Python.

There are currently 5 pickle protocol versions. The higher the version, the higher the Python version. 0-2 is for Python2, and 3-4 is for Python3.

Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of 
Python.Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.Protocol 
version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to 
PEP 307for information about improvements brought by protocol 2. (从这个版本往后,性能有显著提高)Protocol version 3 
was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x.This is the 
default protocol, and the recommended protocol when compatibility with other Python 3 versions is required.Protocol 
version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some 
data format optimizations. Refer to PEP 3154 for information about improvements brought byprotocol 4.

Most entrances to pickle Functions (such as dump(), dumps(), Pickler constructor) all accept a protocol version parameter, which has two built-in variables:

pickle.HIGHEST_PROTOCOL is currently 4

pickle. DEFAULT_PROTOCOL is currently 3

Usage

is similar to the built-in json module interface, dumps() is used to return serialization results, dump() is used to serialize and then write Import the file. In the same way, there are load() and loads(). Among them, you can specify the protocol version when serializing dump(s). This is not required when deserializing, and the version will be automatically identified. This is very similar to the zip command.

Serialization of built-in types

Most built-in types support serialization and deserialization. What needs special attention are functions. The serialization of a function is just based on its name and the module in which it is located. Neither the function's code nor its attributes (Python's functions are first-class objects and can have attributes) will not be serialized. This requires that the module in which the function is located must be importable in the unpickle environment, otherwise ImportError or AttributeError will occur.

There is something interesting here: all lambda functions are not Pickleable. Because their names are called .

Serialization of custom types

Like the experimental code at the beginning of this article, in most cases no additional The operation can realize the serialization/deserialization operation. It should be noted that during the deserialization process, the __init__() of the class is not called to initialize an object, but a new uninitialized instance is created and then its attributes are restored (very clever). The pseudocode is as follows:

def save(obj):
    return (obj.__class__, obj.__dict__) 
def load(cls, attributes):
    obj = cls.__new__(cls)
    obj.__dict__.update(attributes)
    return obj

If you want to do some additional operations during the serialization process, such as saving the state of the object, you can use the magic methods of the pickle protocol, the most common ones are __setstate__() and __getstate__( ).

Security issues (!)

The beginning of the pickle document says: Never unpickle a binary from an unknown source. Consider the following code:

>>> import pickle
>>> pickle.loads(b"cos\nsystem\n(S&#39;echo hello world&#39;\ntR.")
hello world
0

When this code is unpickled, it imports os.system() and then calls echo. There are no side effects. But what if it is rm -rf /·?

The suggestion given in the document is to implement the checking logic in Unpickler.find_class(). Function methods must be called when global variables are required.

import builtins
import io
import pickle
 
safe_builtins = {
    &#39;range&#39;,
    &#39;complex&#39;,
    &#39;set&#39;,
    &#39;frozenset&#39;,
    &#39;slice&#39;,
}
 
class RestrictedUnpickler(pickle.Unpickler):
 
    def find_class(self, module, name):
        # Only allow safe classes from builtins.
        if module == "builtins" and name in safe_builtins:
            return getattr(builtins, name)
        # Forbid everything else.
        raise pickle.UnpicklingError("global &#39;%s.%s&#39; is forbidden" %
                                     (module, name))
 
def restricted_loads(s):
    """Helper function analogous to pickle.loads()."""
    return RestrictedUnpickler(io.BytesIO(s)).load()

Compression

It will not automatically compress after pickling. I think this design is very good. It is decoupled. Pickle is just pickle. Things, compression is left to other libraries to do. And you can also find yourself that although the file after pickling is unreadable, the content is still presented in ASCII code and is not garbled. You need to call compress of the compression library. After actual compression, the volume is about 1/3 of the previous one, which is very impressive.

Summary

It is a bit difficult to keep global variables importable. The question I have to face is: If I need to open the things I pickled today in the future, can I still open them?

There are several versions here: project version, python version, pickle protocol version, and package versions that the project depends on. Among them, I think the python version and the pickle version can safely rely on their backward compatibility and are easy to solve. Mainly the project and version and dependent versions. If the object to be picked is very complex, then it is likely that the backup of the old version is not compatible with the new version. A possible solution is to completely lock all dependencies, such as recording their hash values. If you want to restore a certain binary sequence, then restore the specific dependencies and specific commits of the project at that time.

But for now, our requirement is basically to pickle a requests.Response object. I think we can rely on their backward compatibility. If one day there is a breaking change in requests, then even if our pickle is compatible, the code will not be compatible. At that time, other strategies can be considered.

The above is the detailed content of Object serialization and deserialization of Python's built-in pickle library. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:卡瓦邦噶. If there is any infringement, please contact admin@php.cn delete
Python and Time: Making the Most of Your Study TimePython and Time: Making the Most of Your Study TimeApr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Games, GUIs, and MorePython: Games, GUIs, and MoreApr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C  : Applications and Use Cases ComparedPython vs. C : Applications and Use Cases ComparedApr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic ApproachThe 2-Hour Python Plan: A Realistic ApproachApr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Exploring Its Primary ApplicationsPython: Exploring Its Primary ApplicationsApr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

How Much Python Can You Learn in 2 Hours?How Much Python Can You Learn in 2 Hours?Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics in project and problem-driven methods within 10 hours?How to teach computer novice programming basics in project and problem-driven methods within 10 hours?Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.