How Can I Create Unicode-Aware Regular Expressions in Java?-javaTutorial-php.cn

How Can I Create Unicode-Aware Regular Expressions in Java?

Barbara Streisand

Release： 2024-12-26 03:41:08

Original

183 people have browsed it

How Can I Create Unicode-Aware Regular Expressions in Java?

Unicode Equivalents for w and b in Java Regular Expressions

Java's regex implementation doesn't use the w character class shorthands for "any letter, digit, or connecting punctuation" like other implementations do. This makes matching Unicode words more difficult. The issue extends to the b word separator, which also exhibits inconsistent behavior in Java.

Unicode-Aware Equivalents

To resolve these issues, one can rewrite a regex pattern using the following replacements:

w: [pLpMp{Nd}p{Nl}p{Pc}[p{InEnclosedAlphanumerics}&&p{So}]]
b: (?:(?<=[pLpMp{Nd}p{Nl}p{Pc}[p{InEnclosedAlphanumerics}&&p{So}]])(?![pLpMp{Nd}p{Nl}p{Pc}[p{InEnclosedAlphanumerics}&&p{So}]])|(?

Other Unicode Properties

In addition to w and b, Java's regexes lack Unicode-aware support for other properties. However, these properties can be extended by using the p syntax, as shown below:

Java Syntax	Unicode Property
p{Lower}	Unicode Lowercase
p{Upper}	Unicode Uppercase
p{ASCII}	ASCII
p{Alpha}	Unicode Alphabetic
p{Digit}	Unicode Digit
p{Alnum}	Unicode Alphanumeric
p{Punct}	Unicode Punctuation
p{Graph}	Unicode Graph
p{Print}	Unicode Printable
p{Blank}	Unicode Blank
p{Cntrl}	Unicode Control
p{XDigit}	Unicode Hexadecimal Digit
p{Space}	Unicode Space

Unicode-Aware Regex

By incorporating these Unicode-aware substitutes, one can create regex patterns that handle Unicode data accurately. For example, the following pattern matches Unicode words:

Pattern pattern = Pattern.compile("\w+"); // Unicode-aware \w equivalent

Copy after login

This pattern can be used to match words in text strings, regardless of whether the characters are ASCII or Unicode-encoded.

The above is the detailed content of How Can I Create Unicode-Aware Regular Expressions in Java?. For more information, please follow other related articles on the PHP Chinese website!