Removing Invalid UTF-8 Characters from Strings in Go
When attempting to marshal a list of strings using json.Marshal, it's possible to encounter an error indicating the presence of invalid UTF-8 characters. This article addresses this issue and provides solutions for removing or replacing such characters in Go.
In Python, the unicode module offers methods like unicode.replace and unicode.strict to handle invalid characters. However, Go does not have direct equivalents. Instead, it relies on a different approach:
Using strings.ToValidUTF8 in Go 1.13
To remove invalid UTF-8 characters from a string, you can use the strings.ToValidUTF8 function introduced in Go 1.13. It takes two parameters: the input string and a replacement character to use for invalid bytes. If the replacement character is an empty string, invalid bytes will be silently removed:
invalidString := "a\xc5z" validString := strings.ToValidUTF8(invalidString, "") // validString will now be "az"
Using strings.Map and utf8.RuneError in Go 1.11
An alternative solution is to use strings.Map along with utf8.RuneError. strings.Map applies a function to each rune in a string, while utf8.RuneError represents an invalid UTF-8 character. Here's an example:
invalidString := "a\xc5z" fixUtf := func(r rune) rune { if r == utf8.RuneError { return -1 // Replace invalid characters with -1 } return r } validString := strings.Map(fixUtf, invalidString) fmt.Println(validString) // Output: "az"
The above is the detailed content of How to Remove Invalid UTF-8 Characters from Strings in Go?. For more information, please follow other related articles on the PHP Chinese website!