The regular expression language consists of two basic character types: literal (normal) text characters and metacharacters.
Related recommendations:
1. Regular expression syntax tutorial (including online testing tools)
2. PHP regular expression quick introduction video tutorial
Metacharacters have the ability to be processed using regular expressions. Metacharacters can be any single character placed in[ ]
(for example,[a]
means matching a single lowercase charactera
), or a sequence of characters ( For example,[a-d]
means matching any character betweena, b, c, d
, and\w
means any English letters, numbers and underscores), Common metacharacters are as follows:
Characters | Description | Special instructions |
---|---|---|
. |
Matches any character except the newline character (\n ) |
~ |
[abcde] |
matches any character amonga b c d e |
All characters areor . The relationship |
[a-h] |
matchesa toAny character between h |
~ |
[^fgh] |
does not matchAny character in fgh matches |
. Add^ before the first character of the square brackets[ ] to indicatenegationDoes not match any characters appearing inside square brackets |
\w |
Matches uppercase and lowercase English characters and numbers 0 to 9 Any one between and the underscore is equivalent to[a-zA-Z0-9_] |
~ |
##\W
| is the opposite of
\wand is equivalent to [^a-zA-Z0-9_]
| ~
|
\s
| matches any whitespace character, equivalent to
[\f\n\r\t\v]
| ~
|
\S
| is the opposite of
\s, equivalent to [^\s]
| ~
|
\d
| matches any single digit between 0 and 9, equivalent to
[0-9]
| ~
|
# is the opposite of | \d
, equivalent to[^0-9] ~ |
|
Matches any single Chinese character (Chinese) | (the Chinese characters represented by
Unicodeencoding are used here)~ |
|
Matches the beginning or end of a word | ~
||
Matches the beginning of the string | when placed before the first character of the brackets, it becomes
which means inverse | |
Match the end of the string | ~
Unit:
If the preceding character is a character, then this one The character is a
integer. )
group
, such as^(13)[4-9]\d{8}$
represents any mobile phone number starting with 13.
abcabcabc
represents the last letterc
appearing 1 or more times;(abcabcabc)
represents the entire stringabcabcabc
appears 1 or more times.|
to indicate the relationship ofor
, for example,z|j|q
indicates matchingAny letter among z, j, q
. In fact, it is equivalent to[zjq]
.
ab|cd|ef
means: eitherab
,cd
oref
.a(b|cd|e)f
means: starting witha
, eitherb
orcd
Eithere
, ending withf
.|
(or
) isparentheses(( )
)[0-9A-Z.?]
How do you understand this regular rule?
.
and?
appear insquare brackets,.
and?
Will becomenormal characters, which are dots and question marks. You can understand that the priority of[ ]
is greater than the priority of. and ?
.?aaa.bbb
, remember here. and ?
are completely treated as ordinary characters.The multi-selection structure is actually the use of metacharacters|
(or).
Defining range: beginning, end, parentheses
Special Instructions | ||
---|---|---|
matches 0 to multiple metacharacters, equivalent to {0,} |
~ |
|
matches 0 to 1 metacharacter, equivalent to {0,1} |
~ |
|
matches at least 1 metacharacter, equivalent to {1,} |
~ |
|
Match n metacharacters | ~
||
Match at least n metacharacters | ~
||
Match n to m metacharacters | ~
||
Match word boundaries | ~
||
The string must start with the specified character | ~
||
The string must end with the specified character | ~
Regular | Meaning |
---|---|
Windows98|Windows2000|WindowsXP |
matchesWindows98 orWindows2000 orWindowsXP |
^Windows98|Windows2000|WindowsXP$ |
Starts withWindows98 or containsWindows2000 or ends with WindowsXPNote that ^ and$ are both included in the range of| , because the boundaries of| are only: beginning, end, parentheses |
Windows(98|2000|XP) |
Windows then98 or2000 orXP |
Summary: The multi-selection structure can include many characters, but it cannot exceed the boundaries ofbrackets
.
(\d{1,3}\.){3}\d{1,3}
Simple IP address matching expression((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4] \d|25[0-5]|[01]?\d\d?)
left bracket
of the group as the symbol, from left to right, the first group number The group number is 1, the second one is 2, and so on.Example:
can be used to match duplicates The word
in the regular expression , use parentheses in the front to divide (group), and then put the content matched by the parenthesesand quote
to the back, using\1,\2
, etc. To represent. (The first parenthesis is\1
...). If there are parentheses nested inside parentheses(\w (.?))
Remember: At this time, you need to use(
as the symbol to count the parentheses from left to right. .Advanced 3 - Look Around (Zero Width Assertion)
^
,$
like that.Looking around will not occupy characters.
Looking around is divided intoposition can match
exp. For example:(?=\d)
The right side of the current position is a number.
of the position cannot be matched
exp. For example:(?!\d)
The right side of the current position is not a number.
in front of the position can match
exp. For example:(?<=\d)
To the left of the current position It is a number
in front of the position cannot match
exp. For example:
(?!\d )
The left side of the current position is not a number.
*
,{3,12}
, etc.) that can be repeated ,The usual behavior is to match as many characters as possible
.Regular expression:a# A string starting with ## and ending with
b. If you use it to search for
aabab, it will match the entire string
aabab, which is called -- -----
Greedy matching
-.*?
means matching any number of repetitions , but use the least repeatedunder the premise that the entirecan be matched successfully.
a.*?b, a string ending with
b. If applied to
aabab, it will match
aaband
ab.
Summary:The difference between greedy and lazy mode is:
Lazy modeis behind the quantifierWhen using regular expressions, you need to pay attention to the order of matching. Usually the same priority*There is one more question mark?
.
Advanced 5 - Priority of pattern matching
are higher first and then lower. The matching order priority of various operators isfrom high to lowas shown in the following table.
Order | Metacharacters | Description |
---|---|---|
1 | \ |
Escape characters |
2 | () 、(?:) 、(?=) 、[] |
Mode units and atom tables |
3 | * , ,? 、{n} 、{n,} 、{n,m} |
Duplicate match |
4 | ^ 、$ 、\b 、\B 、\A 、\Z |
Border restrictions |
|
| Pattern selection
333333\$33\ How should the\$
in 33333be written?
2 Question: If the
preg_matchfunction in PHP uses the expressions of
single quotesanddouble quotesto match the above\$,how to write?
Answer:
. (For the convenience of viewing, we split it into
'/\\ \\ \\ $/')
. (For the convenience of viewing, we split it into
"/\\ \\ \\ \$/")
Another answer:
, So we need 6
\to generate the expression.
\, double quotes also need one more
\to escape
$, so it requires 7
\.
The above is the detailed content of Detailed explanation of regular expressions. For more information, please follow other related articles on the PHP Chinese website!