What to do when writing crawler garbled code in golang
When writing a crawler program in golang, you will encounter a page with encoding format gb2312.
It can be seen from the web page that the character encoding of the page is gb2312
and golang supports theUTF-8encoding format by default, so the result of climbing directly will be Garbled characters.
Solution:
Use github.com/axgle/mahonia This package can complete the encoding conversion,
1, and executego get github.com/axgle/mahoniaAfter the command is used to download this package,
github.com\axgle\mahonia
directory. 2. How to use the code
1) Import package
import "github.com/axgle/mahonia"
2) Conversion function
func ConvertToString(src string, srcCode string, tagCode string) string { srcCoder := mahonia.NewDecoder(srcCode) srcResult := srcCoder.ConvertString(src) tagCoder := mahonia.NewDecoder(tagCode) _, cdata, _ := tagCoder.Translate([]byte(srcResult), true) result := string(cdata) return result }
3) Call this function where string conversion encoding is required
result = ConvertToString(html, "gbk", "utf-8")
For more golang knowledge, please Follow thegolang tutorialcolumn on the PHP Chinese website.
The above is the detailed content of How to write garbled crawler code in golang. For more information, please follow other related articles on the PHP Chinese website!