Home > Backend Development > Golang > How to Remove Diacritics from UTF-8 Strings in Go?

How to Remove Diacritics from UTF-8 Strings in Go?

Barbara Streisand
Release: 2024-12-09 01:53:11
Original
1074 people have browsed it

How to Remove Diacritics from UTF-8 Strings in Go?

Removing Diacritics in Go Using Text Normalization Libraries

How can you effortlessly remove diacritics from UTF8-encoded strings in Go? For instance, transform the string "žůžo" into "zuzo."

Solution:

Fortunately, Go offers a range of standard libraries that handle text normalization, including Unicode normalization and diacritic removal. Here's how you can utilize them effectively:

package main

import (
    "fmt"
    "unicode"

    "golang.org/x/text/transform"
    "golang.org/x/text/unicode/norm"
)

func isMn(r rune) bool {
    return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}

func main() {
    t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
    result, _, _ := transform.String(t, "žůžo")
    fmt.Println(result)
}
Copy after login

This code employs a series of transformations using the NFD (Normalization Form D), Mn removal, and NFC (Normalization Form C) algorithms. The result effectively removes all diacritics from the input string.

The above is the detailed content of How to Remove Diacritics from UTF-8 Strings in Go?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template