How to implement Huffman coding algorithm using Python?-Python Tutorial-php.cn

How to implement Huffman coding algorithm using Python?

王林

Release： 2023-09-20 10:49:44

Original

1013 people have browsed it

How to implement Huffman coding algorithm using Python?

How to use Python to implement the Huffman coding algorithm?

Abstract:
Huffman coding is a classic data compression algorithm that achieves efficient compression and storage of data by generating a unique code based on the frequency of character occurrences. This article will introduce how to use Python to implement the Huffman coding algorithm and provide specific code examples.

Understanding the idea of Huffman coding
The core idea of Huffman coding is to use slightly shorter codes for characters that appear more frequently, and to use slightly longer codes for characters that appear less frequently. encoding, thereby achieving a higher compression rate of the encoded data. Specifically, Huffman coding maps the frequency of characters and the corresponding character information one by one, and constructs a Huffman tree to represent the encoding of 0 and 1 according to the left and right branches of the tree node.
Building a Huffman tree
Before we start coding, we need to build a Huffman tree first. First, count the frequency of each character in the string and store the character and frequency information in a frequency dictionary. Then, build a Huffman tree based on the frequency dictionary. The specific steps are as follows:
Initialize a priority queue (minimum heap) for storing Huffman tree nodes
Convert each node in the frequency dictionary characters and frequency information are added to the priority queue as leaf nodes
Loop the following operations until there is only one node left in the queue:
- Select two frequencies from the queue The smallest node serves as the left and right child nodes, and generates a new node, the frequency is the sum of the frequencies of the left and right child nodes
- Add the new node to the queue
The remaining nodes in the queue The node below is the root node of the Huffman tree

The following is a code example:

import heapq
from collections import defaultdict


class Node:
    def __init__(self, frequency, value=None):
        self.frequency = frequency
        self.value = value
        self.left_child = None
        self.right_child = None

    def __lt__(self, other):
        return self.frequency < other.frequency


def build_huffman_tree(freq_dict):
    priority_queue = []

    for char, freq in freq_dict.items():
        heapq.heappush(priority_queue, Node(freq, char))

    while len(priority_queue) > 1:
        left_child = heapq.heappop(priority_queue)
        right_child = heapq.heappop(priority_queue)
        new_node = Node(left_child.frequency + right_child.frequency)
        new_node.left_child = left_child
        new_node.right_child = right_child
        heapq.heappush(priority_queue, new_node)

    return heapq.heappop(priority_queue)

Copy after login

Generate Huffman coding table
After constructing the Huffman After the tree, we can generate the corresponding Huffman coding table based on the Huffman tree. The Huffman coding table maps each character to its corresponding code. The specific steps are as follows:
Traverse the Huffman tree, starting from the root node, the left branch on the path is marked as 0, the right branch is marked as 1, record the path and encoding of each leaf node
Store the path and encoding information in the encoding dictionary

The following is a code example:

def generate_huffman_codes(huffman_tree):
    code_dict = {}

    def traverse(node, current_code=''):
        if node.value:
            code_dict[node.value] = current_code
        else:
            traverse(node.left_child, current_code + '0')
            traverse(node.right_child, current_code + '1')

    traverse(huffman_tree)
    return code_dict

Copy after login

Compress and decompress data
After having the Huffman coding table , we can compress the original data, replace each character of the original data with the corresponding Huffman code, and store the encoded binary data in the file. When decompressing the data, we need to restore the encoded binary data to the original data according to the Huffman coding table.

The following is a code example for compressing and decompressing data:

def compress_data(data, code_dict):
    compressed_data = ''
    for char in data:
        compressed_data += code_dict[char]
    return compressed_data


def decompress_data(compressed_data, huffman_tree):
    decompressed_data = ''
    current_node = huffman_tree
    for bit in compressed_data:
        if bit == '0':
            current_node = current_node.left_child
        else:
            current_node = current_node.right_child

        if current_node.value:
            decompressed_data += current_node.value
            current_node = huffman_tree

    return decompressed_data

Copy after login

Summary:
This article introduces how to use Python to implement the Huffman coding algorithm. The main steps include building Huffman trees, generating Huffman coding tables, and compressing and decompressing data. We hope that the introduction and code examples in this article can help readers better understand and apply the Huffman coding algorithm.

The above is the detailed content of How to implement Huffman coding algorithm using Python?. For more information, please follow other related articles on the PHP Chinese website!