Golang generates consistent hashes for jpeg images without writing to disk

WBOY
Release: 2024-02-11 16:33:08
forward
654 people have browsed it

Golang 为 jpeg 图像生成一致的哈希值,而无需写入磁盘

During the development process, we often need to compare the similarity of image files in order to perform image recognition, deduplication and other operations. Generating a hash of an image is a common approach. Usually, we need to write the image to disk and then read it out for hash calculation. However, using the Golang programming language, we can easily generate a jpeg image while directly calculating a consistent hash value without writing to disk. This saves us time and disk space and increases efficiency. This article will detail how to implement this feature in Golang.

Question content

golang imaging newbie

I'm trying to generate consistent hashes for jpeg images. When I reload the image after writing it to disk as a JPEG (which is expected), loading the image and generating the hash on the raw bytes produces a different hash. Once I write the RBGA to disk as a JPEG, the pixels are modified, which corrupts the hash I calculated earlier.

Just hashing the file hash("abc.jpeg") means I have to write to disk; read back; generate the hash, etc..

  • Is there any setting that can be used to control the behavior of output jpeg pixels when reading/writing
  • Should I use *image.RGBA? The input image is *image.YCbCr?
// Open the input image file
inputFile, _ := os.Open("a.jpg")
defer inputFile.Close()

// Decode the input image
inputImage, _, _ := image.Decode(inputFile)

// Get the dimensions of the input image
width := inputImage.Bounds().Dx()
height := inputImage.Bounds().Dy()
subWidth := width / 4
subHeight := height / 4

// Create a new image
subImg := image.NewRGBA(image.Rect(0, 0, subWidth, subHeight))
draw.Draw(subImg, subImg.Bounds(), inputImage, image.Point{0, 0}, draw.Src)

// id want the hashes to be the same for read / write but they will always differ
hash1 := sha256.Sum256(imageToBytes(subImg))
fmt.Printf("<---OUT [%s] %x\n", filename, hash1)
jpg, _ := os.Create("mytest.jpg")
_ = jpeg.Encode(jpg, subImg, nil)
jpg.Close()

// upon reading it back in the pixels are ever so slightly diff
f, _ := os.Open("mytest.jpg")
img, _, _ := image.Decode(f)
jpg_input := image.NewRGBA(img.Bounds())
draw.Draw(jpg_input, img.Bounds(), img, image.Point{0, 0}, draw.Src)
hash2 := sha256.Sum256(imageToBytes(jpg_input))
fmt.Printf("--->IN  [%s] %x\n", filename, hash2)

            // real world use case is..
            // generate subtile of large image plus hash
            // if hash in a dbase
            //    pixel walk to see if hash collision occurred
            //    if pixels are different
            //       deal with it...
            ///   else
            //      object.filename = dbaseb.filename
            // else
            //     add filename to dbase with hash as the lookup
            //     write to jpeg to disk
Copy after login

Workaround

You can use a hash as the writer's target and use io.MultiWriter to calculate the hash when writing to the file:

hash:=sha256.New()
jpeg.Encode(io.MultiWriter(file,hash),img,nil)
hashValue:=hash.Sum(nil)
Copy after login

The above is the detailed content of Golang generates consistent hashes for jpeg images without writing to disk. For more information, please follow other related articles on the PHP Chinese website!

source:stackoverflow.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!