Detailed explanation of regular expressions

步履不停
Release: 2023-04-06 22:52:01
Original
5408 people have browsed it

Detailed explanation of regular expressions

The regular expression language consists of two basic character types: literal (normal) text characters and metacharacters.

Related recommendations:
1. Regular expression syntax tutorial (including online testing tools)
2. PHP regular expression quick introduction video tutorial

Metacharacters have the ability to be processed using regular expressions. Metacharacters can be any single character placed in[ ](for example,[a]means matching a single lowercase charactera), or a sequence of characters ( For example,[a-d]means matching any character betweena, b, c, d, and\wmeans any English letters, numbers and underscores), Common metacharacters are as follows:

Common metacharacters

is the opposite of ~ matches any whitespace character, equivalent to ~ is the opposite of ~ matches any single digit between 0 and 9, equivalent to ~ ##\D \d ##[\u4e00-\u9fa5] (the Chinese characters represented by \b ~ ^ when placed before the first character of the brackets, it becomes $ ~ Regular expression qualifier
Characters Description Special instructions
. Matches any character except the newline character (\n) ~
[abcde] matches any character amonga b c d e All characters areor. The relationship
[a-h] matchesatoAny character between h ~
[^fgh] does not matchAny character in fghmatches . Add^before the first character of the square brackets[ ]to indicatenegation
Does not match any characters appearing inside square brackets
\w Matches uppercase and lowercase English characters and numbers 0 to 9 Any one between and the underscore is equivalent to[a-zA-Z0-9_] ~
##\W\wand is equivalent to[^a-zA-Z0-9_]
\s[\f\n\r\t\v]
\S\s, equivalent to[^\s]
\d[0-9]
# is the opposite of, equivalent to[^0-9]~
Matches any singleChinese character (Chinese)Unicodeencoding are used here)~
Matches the beginning or end of a word
Matches the beginning of the stringwhich means inverse
Match the end of the string
Function: Limit the number of occurrences of the

unit

preceding this symbol.

Unit:
If the preceding character is a character, then this one The character is a

unit
  1. If we used parentheses to enclose a long string before, then the entire parentheses are considered aunit
  2. The above metacharacters are all matched against a single character. If you want to match multiple characters at the same time, you need to use qualifiers. The following are some common qualifiers (
  3. n in the table below
and m both represent

integer. )

Explanation - Special case

  1. You can surround multiple metacharacters or literal text characters withbrackets to form agroup, such as^(13)[4-9]\d{8}$represents any mobile phone number starting with 13.
    1. abcabcabcrepresents the last lettercappearing 1 or more times;
    2. (abcabcabc)represents the entire stringabcabcabcappears 1 or more times.
  2. You can use|to indicate the relationship ofor, for example,z|j|qindicates matchingAny letter among z, j, q. In fact, it is equivalent to[zjq].
    1. ab|cd|efmeans: eitherab,cdoref.
    2. a(b|cd|e)fmeans: starting witha, eitherborcdEithere, ending withf.
    3. Summary: The only boundary of|(or) isparentheses(( ))
  3. [0-9A-Z.?]How do you understand this regular rule?
    1. When.and?appear insquare brackets,.and?Will becomenormal characters, which are dots and question marks. You can understand that the priority of[ ]is greater than the priority of. and ?.
    2. This regular expression will exactly match the string?aaa.bbb, remember here. and ?are completely treated as ordinary characters.

Advanced 1 - Multi-selection structure

The multi-selection structure is actually the use of metacharacters|(or).
Defining range: beginning, end, parentheses

Characters Description Special Instructions * ? {n} ~ {n, } ~ {n,m} ~ \b ~ ^ ~ $ ~
matches 0 to multiple metacharacters, equivalent to{0,} ~
matches 0 to 1 metacharacter, equivalent to{0,1} ~
matches at least 1 metacharacter, equivalent to{1,} ~
Match n metacharacters
Match at least n metacharacters
Match n to m metacharacters
Match word boundaries
The string must start with the specified character
The string must end with the specified character
Regular Meaning
Windows98|Windows2000|WindowsXP matchesWindows98orWindows2000orWindowsXP
^Windows98|Windows2000|WindowsXP$ Starts withWindows98or containsWindows2000or ends with WindowsXP
Note that^and$are both included in the range of|, because the boundaries of|are only: beginning, end, parentheses
Windows(98|2000|XP) Windowsthen98or2000orXP

Summary: The multi-selection structure can include many characters, but it cannot exceed the boundaries ofbrackets.

Advanced 2 - Grouping and Backreferences

Grouping

  • We already know how to repeat a single character;
  • But if you want to What should I do if I want to repeat a string? Youcan use parentheses to specify subexpressions (also called groupings).
  • (\d{1,3}\.){3}\d{1,3}Simple IP address matching expression
  • But it will also Matches the impossible IP address256.300.888.999. Can you write a more accurate regex?
  • ((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4] \d|25[0-5]|[01]?\d\d?)

Backreference

  • Use parentheses to specify a sub After an expression (grouped), text matching this subexpression can becapturedfor further processing within the expression or other programs.
  • By default, each group will automatically have a group number. The rules are:With theleft bracketof the group as the symbol, from left to right, the first group number The group number is 1, the second one is 2, and so on.

Example:

  • ##\b(\w )\b\s \1\bcan be used to match duplicates The word
  • matches words such as:
  • where where go, tom tom happy
## Straightforward explanation:

in the regular expression , use parentheses in the front to divide (group), and then put the content matched by the parenthesesand quote
to the back, using\1,\2, etc. To represent. (The first parenthesis is\1...). If there are parentheses nested inside parentheses(\w (.?))Remember: At this time, you need to use(as the symbol to count the parentheses from left to right. .Advanced 3 - Look Around (Zero Width Assertion)

Look around does not match any characters, only
    specific positions in the text
  • . Similar to\b,^,$like that.Looking around will not occupy characters.Looking around is divided into
  • order
  • There are two kinds ofreverse order:order
      • (?=exp)
      • The followingposition can matchexp. For example:(?=\d)The right side of the current position is a number.
      • (?!exp)
      • The followingof the position cannot be matchedexp. For example:(?!\d)The right side of the current position is not a number.
      Reverse order
      • (?<=exp)
      • Thein front of the position can matchexp. For example:(?<=\d)To the left of the current position It is a number
      • (? . The in front of the position cannot match exp. For example: (?!\d )The left side of the current position is not a number.
  • Advanced 4 - Greed and Lazy

When the regular expression When it contains
    quantifier
  • (a specified number of codes, such as,*,{3,12}, etc.) that can be repeated ,The usual behavior is to match as many characters as possible.Regular expression:
  • a.*b
  • , it will match the longest character ending witha# A string starting with ## and ending withb. If you use it to search foraabab, it will match the entire stringaabab, which is called -- -----Greedy matching-
  • We need more
  • Lazy matching
  • , that is, matching as few characters as possible, as given above All quantifiers can be converted into lazy matching patterns.
  • Just add a question mark after it?. In this way,.*?means matching any number of repetitions , but use the least repeatedunder the premise that the entirecan be matched successfully.a.*?b
  • matches the shortest one, starting with
  • a, a string ending withb. If applied toaabab, it will matchaabandab.Summary:

The difference between greedy and lazy mode is:

Lazy mode
is behind the quantifier

*There is one more question mark?.Advanced 5 - Priority of pattern matching

When using regular expressions, you need to pay attention to the order of matching. Usually the same priority

is calculated from left to right

, and operations with different priorities

are higher first and then lower. The matching order priority of various operators isfrom high to lowas shown in the following table.

##5 Pattern selection
Order Metacharacters Description
1 \ Escape characters
2 ()(?:)(?=)[] Mode units and atom tables
3 *,,?{n}{n,}{n,m} Duplicate match
4 ^$\b\B\A\Z Border restrictions
|
Example

1. Character escape

1Q: To match the string

333333\$33\ How should the\$in 33333be written?2 Question: If the
preg_matchfunction in PHP uses the expressions ofsingle quotesanddouble quotesto match the above\$,how to write?

Answer:

    The rule required for the expression is
  • \\\$
  • Use single quotes to express the above The string
  • '/\\\\\\$/'. (For the convenience of viewing, we split it into'/\\ \\ \\ $/')
  • Use double quotes to represent the above string
  • "/\\\\ \\\$/". (For the convenience of viewing, we split it into"/\\ \\ \\ \$/")
  • What are you asking?

Another answer:

    Single quotes in PHP do not escape any characters, but only escape
  1. \, So we need 6\to generate the expression.
  2. In addition to escaping

    \, double quotes also need one more\to escape$, so it requires 7\.

Recommended related tutorials:

PHP video tutorial

The above is the detailed content of Detailed explanation of regular expressions. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!