Remove Accents/Diacritics in a String in JavaScript
In order to remove accentuated characters from a string, it is necessary to employ a comprehensive process involving string normalization and character class matching. Here's a detailed guide on how to achieve this:
ES2015/ES6 Solution with String.prototype.normalize()
const str = "Crème Brûlée"; const accentedCharsRegex = /[\u0300-\u036f]/g; const normalizedStr = str.normalize("NFD"); const accentsRemovedStr = normalizedStr.replace(accentedCharsRegex, ""); console.log(accentsRemovedStr); // "Creme Brulee"
Here, the normalize("NFD") method decomposes the combined characters (e.g., è) into their constituent parts (e and ̀). Subsequently, the regular expression [u0300-u036f] targets and replaces all diacritical marks within the specified Unicode range.
Unicode Property Escape Method
Within ES2020, you can leverage Unicode property escapes for a more concise approach:
const str = "Crème Brûlée"; const accentsRemovedStr = str.normalize("NFD").replace(/\p{Diacritic}/gu, ""); console.log(accentsRemovedStr); // "Creme Brulee"
This method utilizes the p{Diacritic} property escape to match all diacritical marks instead of defining a specific Unicode range.
Sorting with Intl.Collator
If your primary goal is to sort accented strings, you can consider using Intl.Collator, which offers satisfactory support for accent-sensitive sorting:
const strArr = ["crème brûlée", "crame brulai", "creme brulee", "crexe brulee", "crome brouillé"]; const collator = new Intl.Collator(); const sortedArr = strArr.sort(collator.compare); console.log(sortedArr);
By default, Intl.Collator will sort strings case-sensitively and accent-insensitively. To achieve accent-sensitive sorting, it is essential to define specific rules during the instantiation of Intl.Collator.
The above is the detailed content of How to Remove Accents from Strings in JavaScript?. For more information, please follow other related articles on the PHP Chinese website!