Home > Java > javaTutorial > How Can I Rewrite Java's \w, \b, and Other Regex Shortcuts for Full Unicode Compatibility?

How Can I Rewrite Java's \w, \b, and Other Regex Shortcuts for Full Unicode Compatibility?

Mary-Kate Olsen
Release: 2024-12-16 19:06:16
Original
590 people have browsed it

How Can I Rewrite Java's w, b, and Other Regex Shortcuts for Full Unicode Compatibility?

Rewriting w and b in Java Regexes for Unicode Compatibility

Java's w and b regular expression shortcuts have limited Unicode support. To address this, you can rewrite these shortcuts using the following Unicode-aware definitions:

\w (words) => [\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]]
\W (non-words) => [^\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]]

\b (word boundary) => (?:(?<=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&amp;&amp;\p{So}]])(?![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&amp;&amp;\p{So}]])|(?<![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&amp;&amp;\p{So}]])(?=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&amp;&amp;\p{So}]]))
\B (non-word boundary) => (?:(?<=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&amp;&amp;\p{So}]])(?=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&amp;&amp;\p{So}]])|(?<![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&amp;&amp;\p{So}]])(?![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&amp;&amp;\p{So}]]))
Copy after login

Other Unicode-Aware Regexp Shortcuts:


  1. u0009-u000Du0020u0085u00A0u1680u180Eu2000-u200Au2028u2029u202Fu205Fu3000
  2. u000A-u000Du0085u2028u2029
  3. u0009u0020u00A0u1680u180Eu2000u2001-u200Au202Fu205Fu3000

The above is the detailed content of How Can I Rewrite Java's \w, \b, and Other Regex Shortcuts for Full Unicode Compatibility?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
Regexp ShortcutUnicode-Aware Definition
s (whitespace)[u0009-u000Du0020u0085u00A0u1680u180Eu2000-u200Au2028u2029u202Fu205Fu3000]
S (non-whitespace)1
v (vertical whitespace)[u000A-u000Du0085u2028u2029]
V (non-vertical whitespace)2
h (horizontal whitespace)[u0009u0020u00A0u1680u180Eu2000-u200Au202Fu205Fu3000]
H (non-horizontal whitespace)3
d (digits)p{Nd}
D (non-digits)P{Nd}
R (line break)(?:(?>u000Du000A)[u000Au000Bu000Cu000Du0085u2028u2029])
X (extended grapheme cluster) (?>PMpM*)