Metacharacters in php regular representation

Metacharacters

Throws a problem: \d represents matching a character. And now I want to match ten or eight, what should I do with any number of numbers?

At this time we need to use metacharacters. When using atoms, I found that it can only match one character, but problems arise when matching multiple characters.
At this time, we need to use metacharacters to help us modify atoms and achieve more functions.

Don’t be scared by the following. We will understand everything after we do experiments bit by bit. The main thing is that these are more versatile.
It is best to prepare a small card to help yourself remember.

Let’s take a look:

Metacharacters	Function description
*	means matching the previous atom, matching the previous character 0 times or any number of times.
+	Matches the preceding character one or more times
?	The preceding character Optional [Optional] With or without
.	More standardly, points should be counted as atoms. Matches all characters except \n
	or. Note: It has the lowest priority.
^	must start with the string after the circumflex character
$	must Ends with the character before $
\b	Word boundary
\B	Non-boundary
{m}	It can only appear m times
{n,m}	Yes Appear n to m times
{m,}	At least m times, the maximum number is not limited
()	Change the priority or treat a string as a whole, and you can also use it to extract the matched data

+ matches the preceding character at least once.

matches successfully, proving the + in \d+. \d matches numbers, and + matches the previous character at least once.

* Matches the previous character 0 times or any number of times

Explanation, the commented out $string1 and $string are matched successfully . Because, \w matches 0-9A-Za-z_, and * means that the previous \w does not need to exist. If present there can be 1 or more.

? The previous character appears 0 or 1 times, optional

Matches $string, $string2 successfully, but fails to match $string1.
Because there are ABC before and after the match, and there is a 0-9 in the middle. 0-9 is optional, but there cannot be more than one.

. (dot) Matches all characters except \n

matches $string, $string2 successfully, but fails to match $string1.
Because there are ABC before and after the match, and there is a 0-9 in the middle. 0-9 is optional, but there cannot be more than one.

|(vertical bar), or, the lowest priority

We will see through experiments the matching of priority and or

Let’s see See:

1. At first, my idea of matching was to match abccd or abbcd. However, when matching $string1 and $string2, the matching results are abc and bcd.

2. Implemented or matching, the matching results are abc or bcd. It does not have a higher priority than strings contiguous together.

Then the question is, what should I do if I want to match abccd or abbcd in the above example?

You need to use () to change the priority.

The results are as follows:

QQ截图20161114135925.png

Conclusion:

1. It does match abccd or abbcd ($string1 or $ string3).

2. But there is one more element in the matching array, and the subscript of this element is 1

3. As long as the content in () matches successfully, the matched data will be placed in In this array element with index 1.

^ (circumflex), must start with the string after ^

The following conclusions were found through experiments:

1.$string1 The match was successful, but $string2 was not matched successfully

2. Because $string1 starts with the specified character

3.$string2 does not start with the character after ^

4. The meaning of the translation of this regular rule is: starting with "Brother Zhu is so handsome" followed by at least one character a-zA-Z0-9_.

$ (dollar sign) must end with the character before $

Let’s run it to see the results and draw the conclusion:

1.$string1 matches successfully, but $string2 does not match successfully

2. The character before $ is \d+, followed by Chinese efforts.

3. Therefore, what matches is this whole. \d refers to the integer type of 0-9, and the + sign represents at least one 0-9

\b and \B word boundary and non-word boundary

us Let’s explain what boundaries are:

1. Regular expressions have boundaries. This boundary is the boundary where the beginning and end of the delimiter are regular.

2.This is an English word, followed by a space, which means that the word has ended and reached the boundary of the word

\bWord boundary means that it must be at the front Or finally.
\B Non-boundary means that it cannot be at the front or last of a regular expression.

Conclusion:

1.$string1, $string2 and $string3 all match successfully.

2. When $string1 matches, this space is the boundary

3. When $string2 matches, thisis is the boundary

4. When $string3 matches, thisisaapple reaches the entire Regular expressions represent the end and therefore the boundary. So the match is successful.

Let’s experiment with non-word boundaries:

Summary:

1. Matching $string1 is successful but $string2 is unsuccessful.

2. Because \B is followed by this, so this cannot appear at word boundaries (spaces and beginning and ending).

{m} can and can only appear m times

Conclusion:
In the above example\d{3} I specified that 0-9 can only It appears 3 times, no more than once or less than once.

{n,m} can appear n to m times

Conclusion:
In the above example\d{1,3}, I specified 0- 9 can only appear once, twice or three times. All other times are wrong

{m,} At least m times, the maximum number is not limited

Conclusion:
In the above example\d{2, }I stipulate that the 0-9 at the end of the drink should appear at least twice, and there is no limit to the maximum number of times. Therefore, $string1 is unsuccessful in matching, and $string2 is matched successfully. $string3 is matched successfully.

Continuing Learning