Summary of regular expression characters-PHP Tutorial-php.cn

Summary of regular expression characters

小云云

Release： 2023-03-20 11:12:02

Original

1526 people have browsed it

Basic regular expression

Matches a single character

Matches the writing of a single number, which can be "[0-9]" or " \d”.

matches a single non-numeric character , then use uppercase "\D".

Matches any and of the 26 letters, use "[a-zA-Z]"

Matches any one character, use the period If "."

matches specific characters, just write it directly. For example, "abcd" matches itself. If you encounter special characters, you need to escape , and the escape character is "\</code>".


 matches a character and the use of square brackets is called "character set". Square brackets are used to specify a "set", matching a character in this set, such as the hexadecimal number "[0-9a-fA-F]". The dot in the character set represents the  dot itself , but other special characters still need to be transferred, such as the backslash character. 
Use quantifiers
Greedy matching
If you want to express the repetition of a rule, you need to use quantifiers. Use curly braces to indicate the number of repetitions. For example, 8 numbers can be expressed like this: "\d{8}"
The quantifiers in the curly brackets can be changed. For example, if 7 to 8 numbers are expressed, it is expressed as " \d{7,8}". The rvalue representing the upper limit does not need to be written. For example, "{0,}" is legal, indicating that it is greater than or equal to 0 characters; but "{,10}" is trying to express the upper limit alone. ” is illegal and should at least be written as “{0,10}”. 
The plus sign "+" indicates that the number of elements to its left is "one or more", which is equal to the effect of "{1,}". So the plus sign is also a special character. 
The asterisk "*" means that the number of elements to its left is "zero or at least one", that is, "{0,}". 
The question mark "?" means "zero or one", which is equivalent to "{0,1}". 
Lazy matching
The above items such as + and * will use the "greedy" pattern when matching. That is, match as many numbers as possible. For example, if you use "5+" to match the string "55555", it will match the longest string it can find, which is "55555". 
If you add a question mark after the quantifier, the matching pattern will become "lazy", which is the one with the least matching. For example, if you use "5+?" to match, you will only find the smallest matching character "5". 
The following are available lazy matching expressions: +? , *?, {n,}?, {m,n}?
Capture grouping (similar to macro definition ) 
You can  "capture" part of the expression  and reference it later as a macro. Use  brackets  to define (capture), and then use "\1" after the definition for reference; if it is the second capture, use "\2", and so on. 
Groups are generally saved, but when the expression is very long, it may be necessary to explicitly indicate not to save the group. For example, if you use the format "(?:THE|The|the)", you use the "?:" label to indicate that no naming tags are required. 
"OR" logic
Use "|" to link two fields to provide "OR" logic. Note the use of 
 "not" logic with parentheses 
 If the character "^" is used in the set "[...]", It means "not", for example, "[^0-9]" is equivalent to "\D". 

Simple pattern matching
The following is a list of commonly used single character matches:
##Number##Letters, numbers, underscores\w[_a-zA-Z0-9]##non-digitNon-letter\t\0##Backspace[\b] is equivalent to "”\bThis only matches the beginning/end of the word, no characters are consumedAny characterThe line terminator cannot be matched using this symbol
Boundary
This section designs a concept: Assertion, also known as "Zero-width assertion ( zero-width assertion)". This concept does not match characters, but positions in the string. 
Start and end of line

Use "^" to indicate the beginning of a line
 Use "$" to indicate the end of a line
Word boundaries and non-word boundaries
For example, to match the word "the", write "\bthe\b”. If you want to match words with "e" in the middle of "brother", you can write "\Be\B"
You can use "\<" to match the beginning of the word, " \>” matches the end of a word. However, these two are not recommended because new matchers may not support them. 
Unicode characters and other characters
Regular expressions support inputting unicode values, such as “\u00e9". Note that unicode must have four hexadecimal digits, either upper or lower case. Javascript also supports "\xe9", but "\x00e9" is wrong. 
Related recommendations: 
js regular expression verification time format example
Regular expression \v metacharacter detailed explanation
JS regular expression key points analysis



Reference type
Pattern
Remarks



\d





 Equivalent to " "


\D




\W



##Tab characterTab



##Null character





 


##Space


\s

[ \t\n\r] 
##Return
\r


 
Line break
\n





##Space between words 



 .





















The above is the detailed content of Summary of regular expression characters. For more information, please follow other related articles on the PHP Chinese website!

Reference type	Pattern	Remarks
	\d
	Equivalent to "	"
\D
\W		##Tab characterTab
		##Null character

	##Space	\s
[ \t\n\r]	##Return	`\r`
	`Line break`	\n
	##Space between words
	.