How to Match Non-ASCII Characters with Word Boundaries in JavaScript Regex?-JS Tutorial-php.cn

How to Match Non-ASCII Characters with Word Boundaries in JavaScript Regex?

Barbara Streisand

Release： 2024-10-27 04:46:29

Original

713 people have browsed it

How to Match Non-ASCII Characters with Word Boundaries in JavaScript Regex?

Matching Non-ASCII Characters in JavaScript Regex with Word Boundaries

In JavaScript, the RegExp object with word boundary (b) matching can encounter limitations when handling non-ASCII characters like Finnish vowels (ä, ö, and å). To accurately match these characters, we need to adjust our approach.

Consider the following code:

<code class="javascript">var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
var searchterm = "äl";

if (new RegExp("\b" + searchterm, "gi").test(title)) {
  // This does not work for "äl"
}</code>

Copy after login

This code attempts to match the term "äl" in the title using the b boundary. However, it fails because b matches word boundaries based on the standard 256-byte range, excluding non-ASCII characters.

Solution: Non-Capturing Group with Word Boundary

To resolve this issue, we can replace b with a non-capturing group that explicitly matches either the beginning of the string or whitespace:

<code class="javascript">if (new RegExp("(?:^|\s)" + searchterm, "gi").test(title)) {
  // Now it works for "äl"
}</code>

Copy after login

Breakdown:

(?:...): non-capturing group
^: beginning of the string
s: whitespace
|: "or" operator

This modified code will match the term "äl" in the title because it defines a more flexible beginning-of-word boundary condition that includes non-ASCII characters.

The above is the detailed content of How to Match Non-ASCII Characters with Word Boundaries in JavaScript Regex?. For more information, please follow other related articles on the PHP Chinese website!