Regular Expressions for Beginners to PHP
1.Delimiter
What is the delimiter?
The so-called delimiter is to set a boundary, and the content must be written within this boundary
// This is the delimiter in regular expressions. The expression must be written in the middle of //
##That is, /a-z/ 2. What are the delimiters of?
Any character other than letters, numbers and backslash\ can be a delimiter, such as | |, //, {}, !!, etc., but it should be noted that if there is no For special needs, we all use delimiting symbols as regular expressions 3.Composition of regular expressions
A standard regular expression consists of 3 parts: (1).Separator (2).Expression (3).ModifierSeparator: The delimiter is used to wrap the expression, which can be any character except special characters. The commonly used delimiter is "/"
Expression: The expression is composed of some special characters (element characters) and non-special characters (text characters) to form
Modifier: Modifiers in PHP regular expressions can change many characteristics of the regular expression, making the regular expression more suitable for you Required (Note: Modifiers are case-sensitive, which means "e" is not equal to "E")
What are the modifiers in regular expressions?
Types and introduction of PHP regular expression modifiers: ◆i: If "i" is added to the modifier, the regular expression will cancel the case. Sensitivity, i.e. "a" and "A" are the same. ◆m: The default regular start "^" and end "$" are only for regular strings. If "m" is added to the modifier, then the start and end will refer to each line of the string: The beginning of each line is "^" and the end is "$". ◆s: If "s" is added to the modifier, the default "." means that any character except the newline character will become any character, including the newline character! ◆x: If this modifier is added, whitespace characters in the expression will be ignored unless it has been escaped. ◆e: This modifier is only useful for replacement, which means it is used as PHP code in replacement. ◆A: If this modifier is used, the expression must be the beginning of the matched string. For example, "/a/A" matches "abcd". ◆E: Contrary to "m", if this modifier is used, then "$" will match the absolute end of the string, not before the newline character. This mode is turned on by default. ◆U: It has the same function as the question mark, and is used to set the "greedy mode".Atoms in regular expressions
The atom is the smallest unit in the regular expression. To put it bluntly, the atom is the content that needs to be matched. A valid regular expression must contain at least one atomExplanation: The spaces, carriage returns, line feeds, 0-9, A-Za-z, Chinese, punctuation marks, and special symbols we see are all atoms. Before doing the atomic example, let’s first explain a function, preg_match
Syntax: int preg_match (string $regular, string $string[, array &$result])
The above is preg_match Several commonly used main parameters. I did not list several other parameters above. Because the other two parameters are too uncommon.
Let’s prove it through experiments:
Note: $zz is the rule of regular expression $string is a string. This example is to determine whether this string satisfies the matching regular expression If the formula is satisfied, the result will be output. If it is not satisfied, the information will be output.
Specially identified atoms
##\d Matches 0-9
\D All characters except 0-9
\w a-z A-Z0-9_
##\W Opposite of \w
\s Matches all whitespace characters
\S Non-empty characters
[] Specified range of atoms
Look at these\w \s \W \S is a bit hard to remember, so there are equivalents below. The effect is the same as \s \w etc.
+ Matches the preceding character at least 1 time
* Matches the preceding character 0 times or any number of times
? The preceding character appears 0 or 1 times, optional
. (dot) matches except\ All characters except n
| (vertical bar), or, lowest priority
through above We can see the following examples:1. At first, my idea of matching was to match abccd or abbcd. However, when $string1 and $string2 are matched, the matching results are abc and bcd.
2. After achieving or matching, the matching results are abc or bcd. It does not have a higher priority than strings that are contiguous together
^ (circumflex), must start with the string after ^
The following conclusions were found through experiments:
1 . $string1 matched successfully, $string2 did not match successfully
2. Because $string1 starts with the specified character
3. And $string2 does not start with the character after ^
4. The meaning of the translation of this regular rule is: starting with "Li Wenkai is so handsome" followed by at least one character a-zA-Z0-9_.
$ (dollar sign) must end with the character before $
Note:
$string1 matched successfully , and the $string2 match is unsuccessful. The character before
$ is \d+, followed by Chinese efforts.
Therefore, the match is this whole one. \d refers to the integer type 0-9, the + sign represents at least one 0-9
##{m} can and can only appear m times
Note:In the above example\d{1,3}, I stipulated that 0-9 can only appear once, 2 or 3 times. All other times are wrong
{m,} At least m times, the maximum number is not limited
In the above example \d{2,} I stipulated that the 0-9 behind the drink should appear at least twice, and there is no limit to the maximum number of times. Therefore, $string1 is unsuccessful in matching, and $string2 is matched successfully. $string3 is a successful match
Tips for regular expressions
Write a little and test a little Because we need constant regularization, use preg_match Check whether the comparison is successful. If it succeeds, let’s write the next point. Until you finish writing and all matches are successful! Next let’s write an integrated example of a regular expression for email Step one: List all email formats liwenkai@phpxy.com iwenkai@corp.baidu.cm iwenkai@126.com _w_k@xxx.com 2345@qq.com First Match the character before @ \w+ (because it is 0-9A-Za-z_) The second one is followed by an @ character The third one is written [a-zA-Z0- 9-]+ Because the main domain names such as qq and 126 cannot be underscored by corp.baidu. Or 126. Usually the email suffix is like this. So we can write: ([a-zA-Z0-9-]+.){1,2} The above is to escape . so that it has its own meaning. The brackets must be repeated at least once and at most twice. Just follow com|cn|org|gov.cn|net|edu.cn and so on