PHP preg_replace() regular replacement string_PHP tutorial-PHP Tutorial-php.cn

The data processed by the program is not always designed in advance with database thinking, or it cannot be stored using the structure of the database.
For example, template engine parsing templates, spam sensitive information filtering, etc.
Generally in this case, we use regular expressions to match preg_match and replace preg_replace according to our rules.
But in general applications, they are nothing more than database CRUD, and there are very few opportunities to fiddle with regular expressions.
According to what was said before, there are two scenarios: statistical analysis, using matching; processing, using replacement.

PHP preg_replace() regular replacement is different from Javascript regular replacement. PHP preg_replace() defaults to replacing all elements whose symbols match the conditions.
preg_replace (regular expression, replace with, string, maximum number of replacements [default -1, countless times], number of replacements)

Regular expressions in most languages are similar, but there are also subtle differences.

PHP Regular Expression

Regular character Regular interpretation
marks the next character as a special character, or a literal character, or a backward reference, or an octal escape character. For example, "n" matches the character "n". "\n" matches a newline character. The sequence "\" matches "" and "(" matches "(".
^ matches the beginning of the input string. If the Multiline property of the RegExp object is set, ^ also matches after "n" or "r" position.
$ matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before "n" or "r".
* matches the preceding subexpression zero times. Multiple times. For example, zo* can match "z" and "zoo". * is equivalent to {0,}+. For example, "zo+" can match "zo". " and "zoo", but not "z". + is equivalent to {1,}.
? matches the preceding subexpression zero or once. For example, "do(es)?" matches "does" ” or “do” in “does”. Equivalent to {0,1}.
{n} n is a non-negative integer. Matches a certain n times. For example, "o{2}" cannot match "o" in "Bob", but can match two o's in "food".
{n,} n is a non-negative integer. For example, "o{2,}" cannot be matched. Matches "o" in "Bob", but matches all o's in "foooood". "o{1,}" is equivalent to "o+", and "o{0,}" is equivalent to "o*". .
{n,m} Both m and n are non-negative integers, where n<=m. Match at least n times and at most m times. For example, "o{1,3}" will match "fooooood". The first three o's. "o{0,1}" is equivalent to "o?". Please note that there cannot be a space between the comma and the two numbers when this character is followed by any other limit. character (*, +,?, {n}, {n,}, {n,m}), the matching mode is non-greedy. The non-greedy mode matches the searched string as little as possible, while the default greedy The pattern matches as much of the searched string as possible. For example, for the string "oooo", "o?" will match a single "o", while "o+" will match all "o"s. Any single character except "n". To match any character including "n", use a pattern like "[sS]" to match pattern and get this match. Matches can be obtained from the generated Matches collection, using the SubMatches collection in VBScript, and the $0...$9 attributes in JScript. To match parentheses, use "(" or ")". ) matches pattern but does not retrieve the match result, which means it is a non-retrieval match and is not stored for later use. This is useful when using the or character "(|)" to combine parts of a pattern, such as "industr". (?:y|ies)" is a simpler expression than "industry|industries".
(?=pattern) Positive positive lookup, matching the search string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, "Windows(?=95|98|NT|2000)" can match "Windows" in "Windows2000", but cannot match "Windows" in "Windows3.1". Prefetching does not consume characters, that is, after a match occurs, the search for the next match begins immediately after the last match, rather than starting after the character containing the prefetch.
(?!pattern) forward negative lookup, matching the search string at the beginning of any string that does not match pattern. This is a non-fetch match, that is, the match does not need to be fetched for later use. For example, "Windows(?!95|98|NT|2000)" can match "Windows" in "Windows3.1", but cannot match "Windows" in "Windows2000".
(?<=pattern) Reverse positive pre-check is similar to forward positive pre-check, but in the opposite direction. For example, "(?<=95|98|NT|2000)Windows" can match "Windows" in "2000Windows", but cannot match "Windows" in "3.1Windows".
(?x|y matches x or y. For example, "z|food" matches "z" or "food". "(z|f)ood" matches "zood" or "food".
[xyz] character set. Matches any one of the characters contained. For example, "[abc]" would match the "a" in "plain".
[^xyz] Negative value character set. Matches any character not included. For example, "[^abc]" would match "plin" in "plain".
[a-z] character range. Matches any character within the specified range. For example, "[a-z]" matches any lowercase alphabetic character in the range "a" through "z". Note: Only when the hyphen is inside the character group and between two characters, it can represent the range of characters; if it appears at the beginning of the character group, it can only represent the hyphen itself.
[^a-z] Negative value Character range. Matches any character not within the specified range. For example, "[^a-z]" matches any character that is not in the range "a" through "z".
b matches a word boundary, which refers to the position between a word and a space. For example, "erb" can match the "er" in "never" but not the "er" in "verb".
B matches non-word boundaries. "erB" matches the "er" in "verb", but not the "er" in "never".
cx matches the control character specified by x. For example, cM matches a Control-M or carriage return character. The value of x must be one of A-Z or a-z. Otherwise, treat c as a literal "c" character.
d matches a numeric character. Equivalent to [0-9].
D matches a non-numeric character. Equivalent to [^0-9].
f matches a form feed character. Equivalent to x0c and cL.
n matches a newline character. Equivalent to x0a and cJ.
r matches a carriage return character. Equivalent to x0d and cM.
s match any whitespace character, including spaces, tabs, form feeds, etc. Equivalent to [fnrtv].
S matches any non-whitespace character. Equivalent to [^ fnrtv].
t matches a tab character. Equivalent to x09 and cI.
v matches a vertical tab character. Equivalent to x0b and cK.
w matches any word character including an underscore. Equivalent to "[A-Za-z0-9_]".
W matches any non-word character. Equivalent to "[^A-Za-z0-9_]".
xn matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be exactly two digits long. For example, "x41" matches "A". "x041" is equivalent to "x04&1". ASCII encoding can be used in regular expressions.
num matches num, where num is a positive integer. A reference to the match obtained. For example, "(.)1" matches two consecutive identical characters.
n identifies an octal escape value or a backreference. n is a backward reference if n is preceded by at least n fetched subexpressions. Otherwise, if n is an octal number (0-7), then n is an octal escape value.
nm identifies an octal escape value or a backreference. If nm is preceded by at least nm get subexpressions, nm is a backward reference. If nm is preceded by at least n obtains, then n is a backward reference followed by the literal m. If none of the previous conditions are met, and if n and m are both octal numbers (0-7), nm will match the octal escape value nm.
nml If n is an octal number (0-7), and m and l are both octal numbers (0-7), then matches the octal escape value nml.
un matches n, where n is a Unicode character represented by four hexadecimal digits. For example, u00A9 matches the copyright symbol (©).

The above table is a relatively comprehensive explanation of regular expressions, and the regular characters in trademarks have special meanings and no longer represent the meaning of the original characters. For example, "+" in regular expressions does not represent a plus sign, but represents matching one or more times. And if you want "+" to represent a plus sign, you need to add "" escape in front of it, that is, use "+" to represent a plus sign.
1+1=2 regular expression is: 1+1=2
And the regular expression 1+1=2 can represent multiple 1=2, that is:
11=2 regular expression ：1+1=2
111=2 Regular expression: 1+1=2
1111=2 Regular expression: 1+1=2
……
That is to say, all regular characters They all have specific meanings. If they need to be used to express the meaning of the original characters, they need to be escaped with "" in front. Even if they are non-regular characters, there is no problem with escaping them with "".
1+1=2 The regular expression can also be: 1+1=2
Escapes all characters, but this is not recommended.
Regular expressions must be surrounded by delimiters. In Javascript, the delimiter is "/", while in PHP, it is more common to use "/" to delimit it, and you can also use "#" to delimit it. boundary, and it needs to be surrounded by quotation marks.
If the regular expression contains these delimiters, you will need to escape these characters.

PHP regular expression delimiter
Regular expressions in most languages are delimited by "/", and in PHP, you can also use "#" to delimit, if the string Contains a large number of "/" characters. When using "/" to delimit, these "/" need to be escaped, but using "#" does not require escaping, which is more concise.
$weigeti='The URL of the W3CSchool online tutorial is http://e.jbxue.com/. Can you replace this URL with the correct URL? ';
// The above requirement is to replace http://e.jbxue.com/ with http://e.jbxue.com/w3c/
// . : - are all regular symbols, so It needs to be escaped, and / is the delimiter. If the string contains the / delimiter, you need to escape
echo preg_replace('/http://www.jbxue.com//','http:/ /e.jbxue.com/w3c/',$weigeti);
// When # is used as the delimiter, / is no longer the meaning of the delimiter and does not need to be escaped.
echo preg_replace('#http://www.jbxue.com/#','http://e.jbxue.com/w3c/',$weigeti);
//The above two output results All the same, [The URL of the W3CSchool online tutorial is http://e.jbxue.com/w3c/. Can you replace this URL with the correct URL? 】
?>
Through the above two PHP regular replacement codes, we can find that if the regular statement contains a large number of "/", it is OK to use "/" or "#" as the delimiter. , but using "#" can make the code look more concise. However, E-Dimension Technology recommends that you keep using "/" as the delimiter, because in languages such as Javascript, you can only use "/" as the delimiter. This can form a habit in writing and can be used in other languages.
PHP regular expression modifier

The

modifier is placed at the end of the PHP regular expression delimiter "/" and before the trailing quotation mark of the regular expression.
i Ignores case, matching does not consider case
m Multi-line independent matching, if the string does not contain [n] and other newlines, it will be the same as ordinary regular expressions.
s sets the regular symbol. Can match the newline character [n]. If not set, the regular symbol. cannot match the newline character n.
x Ignore unescaped spaces
e eval() Execute the function on the matched element.
A forward anchoring, constraint matching only searches from the target string
D locks $ as the end, if there is no D, if the string contains newline characters such as [n], $ will still match newline characters. If modifier m is set, modifier D is ignored.
S Analyze non-anchored matches
U Non-greedy, if you add "?" after the regular character quantifier, you can restore greedy
X Open attachments that are incompatible with perl
u Mandatory characters The string is UTF-8 encoded, which is generally required in non-UTF-8 encoded documents. It is recommended not to use this in UTF-8 environment. According to E-dimensional Technology's investigation, there will be a bug when using this. This bug URL:
If you are familiar with Javascript regular expressions, you may be familiar with the modifier "g" of Javascript regular expressions, which means matching all elements that meet the conditions. In PHP regular replacement, it is an element that matches all symbol conditions, so there is no Javascript modifier "g".

PHP regular Chinese and case-ignoring PHP preg_replace() is case-sensitive and can only match strings in ASCII encoding. If you need to match case-insensitive and Chinese characters, you need to add the corresponding modifier i or u.
$weigeti='W3CSschool online tutorial URL: http://www.jbxue.com/w3school/';
echo preg_replace('/W3CSschool/','w3c',$ weigeti);
//Different case, output [w3c online tutorial website: http://www.jbxue.com/w3school/]
echo preg_replace('/W3CSschool/i','w3c',$ weigeti);
//Ignore case and perform replacement output [w3c online tutorial URL: http://e.jbxue.com/w3c/]
echo preg_replace('/URL/u','', $weigeti);
//Force UTF-8 Chinese, perform replacement, and output [W3CSchool online tutorial: http://www.jbxue.com/w3school/]
?>
Case and Chinese All are sensitive in PHP, but in Javascript regular, it is only case-sensitive. Ignoring case is also affected by the modifier i, but Javascript does not need to tell whether it is a special character such as UTF-8 Chinese, and can directly match Chinese. .

PHP regular newline character example
When PHP regular expression encounters a newline character, it will treat the newline character as an ordinary character in the middle of the string. The general symbol . cannot match n, so there are many points when encountering a string with a newline character.

$weigeti="jbxue.comnISnLOVINGnYOU";
// I want to replace the above $weigeti with jbxue.com
echo preg_replace('/^[A-Z].*[ A-Z]$/','',$weigeti);
// This regular expression matches elements that only contain w. $weigeti starts with V, which is consistent with [A-Z], and ends with U, which is also consistent with [A-Z]. .Unable to match n
// Output [jbxue.com IS LOVEING YOU]
echo preg_replace('/^[A-Z].*[A-Z]$/s','',$weigeti);
// This uses the modifier s, that is, it can match n, so the whole sentence matches, and the output is empty
// Output []
echo preg_replace('/^[A-Z].*[A-Z]$/ m','',$weigeti);
// The modifier is used here to match n as multiple lines independently. It is equivalent to:
/*
$preg_m=preg_replace('/^[A-Z].*[A-Z]$/m','',$weigeti);
$p='/ ^[A-Z].*[A-Z]$/';
$a=preg_replace($p,'','jbxue.com');
$b=preg_replace($p,'','IS ');
$c=preg_replace($p,'','LOVING');
$d=preg_replace($p,'','YOU');
$preg_m === $ a.$b.$c.$d;
*/
// Output [jbxue.com]
?>

In the future, when you use PHP to crawl the content of a website and replace it with regular expressions in batches, you will inevitably ignore that the acquired content contains line breaks, so you must pay attention when using regular expression replacement.
PHP regular matching execution function PHP regular replacement can use a modifier e, which represents eval() to execute a function on the matched content.
$weigeti='W3CSchool online tutorial website: http://www.jbxue.com, are you Jbzj!? ';
// Convert the above URL to lower case
echo preg_replace('/(http:[/w.-]+/)/e','strtolower("$1")',$weigeti);
// After using the modifier e, you can execute the PHP function strtolower() on the matching URL
// Output [W3CSchool online tutorial URL: http://www.jbxue.com, you Jbzj! ? 】
?>
According to the above code, although the matched function strtolower() is within quotation marks, it will still be executed by eval().

Regular replacement matching variable backward reference
If you are familiar with Javascript, you must be familiar with backward references such as $1 $2 $3..., and in PHP these can also be used as backward reference parameters. In PHP, you can also use 1 \1 to represent a backward reference.
The concept of backward reference is to match a large fragment. This regular expression is internally cut into several small matching elements using parentheses, and then each matching element is replaced by a backward reference according to the sequence of parentheses.
$weigeti='W3CSchool online tutorial website: http://www.jbxue.com, are you Jbzj!? ';
echo preg_replace('/.+(http:[w-/.]+/)[^w-!]+([w-!]+).+/','$1',$weigeti );
echo preg_replace('/.+(http:[w-/.]+/)[^w-!]+([w-!]+).+/','1',$weigeti );
echo preg_replace('/.+(http:[w-/.]+/)[^w-!]+([w-!]+).+/','\1',$ weigeti);
// The above three are all output [http://www.jbxue.com]
echo preg_replace('/^(.+) URL: (http:[w-/.]+ /)[^w-!]+([w-!]+).+$/','Column: $1
Website: $2
Trademark: $3',$weigeti);
/*
Column: W3CSchool Online Tutorial
Website: http://www.jbxue.com
Trademark: Jbzj!
*/
// Brackets in brackets, outer brackets are counted first
echo preg_replace('/^((.+)URL: (http:[w-/.]+/)[^w-!]+([w-!]+).+)$/',' Original text: $1
Column: $2
Website: $3
Trademark: $4',$weigeti);
/*
Original text: W3CSchool Online Tutorial Website: http://www .jbxue.com, are you Jbzj!?
Column: W3CSchool Online Tutorial
Website: http://www.jbxue.com
Trademark: Jbzj!
*/