PHP trim unicode spaces-PHP Chinese Network Q&A

Article Topic Learning Download Q&A Programming Dictionary Game Recent Updates

简体中文(ZH-CN) English(EN) 繁体中文(ZH-TW) 日本語(JA) 한국어(KO) Melayu(MS) Français(FR) Deutsch(DE)

PHP trim unicode spaces

P粉163951336 2023-11-13 08:49:45

795

I am trying to trim unicode spaces such as this character and I was able to do it using this solution. The problem with this solution is that it does not trim unicode spaces between normal characters. For example, this uses thin space

$string = "   test   string   "; echo preg_replace('/^[pZpC]+|[pZpC]+$/u', '', $string); // outputs: test   string

I know a little bit about regular expressions, so I don't know what to change my expression to solve this problem

P粉163951336

reply all (2)

P粉5579579702023-11-14 00:59:08 2 floor

To remove all Unicode whitespace with control characters at the beginning and end of a string, and to remove all Unicode whitespace with control characters except regular spaces anywhere within the string, you can use

preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$|(?! )[\pZ\pC]/u', '', $string) // Or, simply preg_replace('/^\s+|\s+$|[^\S ]/u', '', $string)

SeeRegular Expression Demo #1and Regular ExpressionDemo #2.

details

^[\pZ\pC]- One or more spaces or control characters at the beginning of the string
|- or
[\pZ\pC] $- One or more spaces or control characters
|- or
(?! )[\pZ\pC]- One or more spaces or control characters other than regular spaces anywhere within the string
[^\S ]- Any whitespace except regular whitespace (\x20)

If you also need to "exclude" common newlines, replace(?! )[\pZ\pC]with(?![ \r\n])[ \pZ \pC](suggested by @MonkeyZeus), in the second regex, this means you need to use[^\S \r\n].

View PHP Demo:

echo preg_replace('~^[\pZ\pC]+|[\pZ\pC]+$|(?! )[\pZ\pC]~u', '', 'abc def ghi '); // => abc defghi echo preg_replace('/^\s+|\s+$|[^\S ]/u', '', 'abc def ghi '); // => abc defghi

Like+0

Add Reply

P粉4457509422023-11-14 00:22:00 1 floor

How such Unicode spaces \u{2009} can cause problems in different places. So I would replace all unicode spaces with regular spaces and then apply trim().

$string = "   test   string and XY \t "; //\u{2009}\u{2009}\u{2009}test\u{2009}\u{2009}\u{2009}string\u{2009}and\x20XY\x20\x09\u{2009} $trimString = trim(preg_replace('/[\pZ\pC]/u', ' ', $string)); //test\x20\x20\x20string\x20and\x20XY

Note: The string in the comment is represented by debug::writeUni($string, $trimString);. Implemented fromthis class.

Like+0

Add Reply