Home > Backend Development > Golang > Why do Python and Go zlib generate different compressed output for the same input?

Why do Python and Go zlib generate different compressed output for the same input?

DDD
Release: 2024-10-29 06:16:02
Original
709 people have browsed it

 Why do Python and Go zlib generate different compressed output for the same input?

Golang vs Python zlib: Dissecting the Output Differences

In the provided code snippets, you're attempting to compress a string using both Python's zlib and Go's flate package. However, your Python implementation yields a different output than the Go counterpart. Why is this the case?

To assist in debugging, let's analyze the relevant code fragments:

Go Implementation (compress.go)

<code class="go">package main

import (
    "compress/flate"
    "bytes"
    "fmt"
)

func compress(source string) []byte {
    w, _ := flate.NewWriter(nil, 7)
    buf := new(bytes.Buffer)

    w.Reset(buf)
    w.Write([]byte(source))
    w.Close()

    return buf.Bytes()
}

func main() {
    example := "foo"
    compressed := compress(example)
    fmt.Println(compressed)
}</code>
Copy after login

The key step in the Go code is closing the Writer, which flushes the compressed data and writes a checksum to the end.

Python Implementation (compress.py)

<code class="python">from __future__ import print_function

import zlib


def compress(source):
    # golang zlib strips header + checksum
    compressor = zlib.compressobj(7, zlib.DEFLATED, -15)
    compressor.compress(source)
    # python zlib defaults to Z_FLUSH, but 
    # https://golang.org/pkg/compress/flate/#Writer.Flush
    # says "Flush is equivalent to Z_SYNC_FLUSH"
    return compressor.flush(zlib.Z_SYNC_FLUSH)


def main():
    example = u"foo"
    compressed = compress(example)
    print(list(bytearray(compressed)))


if __name__ == "__main__":
    main()</code>
Copy after login

Here, you've explicitly flushed the compressor by calling compressor.flush(zlib.Z_SYNC_FLUSH).

Dissecting the Output

The Python output contains a fifth byte of 0, whereas Go has 4. The former is the result of Zlib's handling of the end of data. The latter is due to Flate stripping the header and checksum when closing the writer.

Bridging the Output Gap

To obtain comparable output from both implementations, you can either:

  1. Use Flush() in Go: Replace w.Close() with w.Flush() in your Go code to emit the compressed data without the checksum.

    <code class="go">buf := new(bytes.Buffer)
     w, _ := flate.NewWriter(buf, 7)
     w.Write([]byte(source))
     w.Flush()
    
     return buf.Bytes()</code>
    Copy after login
  2. Tweak Python's Zlib Settings: I've not personally explored if you can force Python's zlib to output the complete DEFLATE stream without headers or checksums. However, this might be a fruitful avenue to pursue.

Conclusion

While you might be able to tweak parameters to force a byte-for-byte match between the two implementations, this is not necessary or even desirable. The output compatibility between different compression libraries is guaranteed but not identical.

The above is the detailed content of Why do Python and Go zlib generate different compressed output for the same input?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template