Menulis semula w dan b dalam Java Regexs untuk Keserasian Unikod
Pintasan ungkapan biasa w dan b Java mempunyai sokongan Unicode yang terhad. Untuk menangani perkara ini, anda boleh menulis semula pintasan ini menggunakan takrifan Unicode-aware berikut:
\w (words) => [\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]] \W (non-words) => [^\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]] \b (word boundary) => (?:(?<=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])|(?<![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])) \B (non-word boundary) => (?:(?<=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])|(?<![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]]))
Regexp Unicode-Aware Lain Pintasan:
Regexp Shortcut | Unicode-Aware Definition | |
---|---|---|
s (whitespace) | [u0009-u000Du0020u0085u00A0u1680u180Eu2000-u200Au2028u2029u202Fu205Fu3000] | |
S (non-whitespace) | 1 | |
v (vertical whitespace) | [u000A-u000Du0085u2028u2029] | |
V (non-vertical whitespace) | 2 | |
h (horizontal whitespace) | [u0009u0020u00A0u1680u180Eu2000-u200Au202Fu205Fu3000] | |
H (non-horizontal whitespace) | 3 | |
d (digits) | p{Nd} | |
D (non-digits) | P{Nd} | |
R (line break) | (?:(?>u000Du000A) | [u000Au000Bu000Cu000Du0085u2028u2029]) |
X (extended grapheme cluster) | (?>PMpM*) |