Home > Article > Backend Development > Content summary of PHP regularity (detailed)
The content of this article is a summary (detailed) of PHP regular content. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
1. Regular basic knowledge
Row locators (^ and $)
Row locators are used to describe the boundaries of strings. "$" represents the end of the line. "^" represents the beginning of the line. For example, "^de" represents a string starting with de. "de$" represents a string ending with de.
Word delimiter
When we are searching for a word, such as whether an exists in a string "gril and body", it is obvious that if it matches, an is definitely OK The matching string "gril and body" is matched. How can we make it match words instead of part of words? At this time, we can use a word delimiter \b.
\ban\b If you try to match "gril and body", it will prompt that it cannot match.
Of course there is also a capital \B, which means exactly the opposite of \b. The string it matches cannot be a complete word, but a part of other words or strings. Such as \Ban\B.
Select the character (|) to represent or
Select the character to represent or. For example, Aa|aA means Aa or aA. Note that the difference between using "[]" and "|" is that "[]" can only match a single character, while "|" can match a string of any length. When using "[]", it is often used together with the connecting character "-", such as [a-d], which represents a or b or c or d.
Exclude characters, exclude operations
Regular expressions provide "^" to exclude non-matching characters, ^ is generally placed in []. For example, [^1-5], this character is not a number between 1 and 5.
Qualifier (?* {n, m})
Qualifier is mainly used to limit the number of occurrences of each string.
Qualified characters | Meaning |
---|---|
? | Zero or once |
* | Zero or more times |
One or more times | |
{n} | n times |
{n,} | At least n times |
{n,m} | n to m times |
For example (D) represents one or more D
Dot operator
matches any character (not including Newline character)
The backslash (\) in the expression
The backslash in the expression has multiple meanings, such as escaping and specifying predefined Character sets, defining assertions, displaying non-printable characters.
Escape characters
Escape characters mainly convert some special characters into ordinary characters. These commonly used special characters include ".", "?", "\", etc.
Specify a predefined character set
Characters | Meaning |
---|---|
\d | Any decimal number [0-9] |
\D | Any non-decimal number |
\s | Any whitespace character (space, line feed, form feed, carriage return, character) |
\S | Any non-whitespace character |
\w | Any word character |
\W | Any non-word characters |
Characters | Meaning |
---|---|
\a | Call the police |
\b | backspace |
\f | Page change |
\n | Line break |
\r | Enter |
\t | Character |
Bracket character ()
The main functions of parentheses in regular expressions are:
Change the qualifier such as ( |, *, ^) scope
Pattern modifier
The role of the pattern modifier is to set the pattern, that is, how the regular expression explain. The main patterns in php are as follows:Modifier | illustrate |
---|---|
i | Ignore case |
m | Multiple text mode |
s | Single line text mode |
x
|
Ignore whitespace characters |
U Lazy mode (do not write the default greedy mode)
2. Commonly used PHP regular functions and examples
a. preg_grep() function
## The #preg_grep function returns array entries that match a pattern. Syntaxarray preg_grep ( string $pattern , array $input [, int $flags = 0 ] )
Return the specified matching elements in the array:
<?php $array = array(1, 2, 3.4, 53, 7.9); // 返回所有包含浮点数的元素 $fl_array = preg_grep("/^(\d+)?\.\d+$/", $array); print_r($fl_array); ?>The execution result is as follows:
Array ( [2] => 3.4 [4] => 7.9 )It can be seen that preg_grep only returns the floating point numbers in the array.
b.preg_match() function
PHP Regular Expression (PCRE)preg_last_error function is used to perform a regular expression match. Syntaxint preg_match ( string $pattern , string $subject [, array &$matches [, int $flags = 0 [, int $offset = 0 ]]] )
<?php //模式分隔符后的"i"标记这是一个大小写不敏感的搜索 if (preg_match("/php/i", "PHP is the web scripting language of choice.")) { echo "查找到匹配的字符串 php。"; } else { echo "未发现匹配的字符串 php。"; } ?>
查找到匹配的字符串 php。Find the word "word"
<?php /* 模式中的\b标记一个单词边界,所以只有独立的单词"web"会被匹配,而不会匹配 * 单词的部分内容比如"webbing" 或 "cobweb" */ if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) { echo "查找到匹配的字符串。\n"; } else { echo "未发现匹配的字符串。\n"; } if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) { echo "查找到匹配的字符串。\n"; } else { echo "未发现匹配的字符串。\n"; } ?>
查找到匹配的字符串。 未发现匹配的字符串。Get the domain name in the URL
<?php // 从URL中获取主机名称 preg_match('@^(?:http://)?([^/]+)@i', "http://www.runoob.com/index.html", $matches); $host = $matches[1]; // 获取主机名称的后面两部分 preg_match('/[^.]+\.[^.]+$/', $host, $matches); echo "domain name is: {$matches[0]}\n"; ?>
domain name is: runoob.comc.preg_match_all() function
int preg_match_all ( string $pattern , string $subject [, array &$matches [, int $flags = PREG_PATTERN_ORDER [, int $offset = 0 ]]] )
<?php $userinfo = "Name: <b>PHP</b> <br> Title: <b>Programming Language</b>"; preg_match_all ("/<b>(.*)<\/b>/U", $userinfo, $pat_array); print_r($pat_array[0]); ?>
Array ( [0] => <b>PHP</b> [1] => <b>Programming Language</b> )d. preg_replace() function preg_replace function performs a regular expression search and replacement. Syntax
mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject [, int $limit = -1 [, int &$count ]] )
$count: 可选,为替换执行的次数。(用于统计被替换的次数)
如果 subject 是一个数组, preg_replace() 返回一个数组, 其他情况下返回一个字符串。
如果匹配被查找到,替换后的 subject 被返回,其他情况下 返回没有改变的 subject。如果发生错误,返回 NULL。
<?php $string = 'google 123, 456'; $pattern = '/(\w+) (\d+), (\d+)/i'; $replacement = 'runoob ${2},$3'; echo preg_replace($pattern, $replacement, $string); ?>
执行结果如下所示:
runoob 123,456
<?php $str = 'runo o b'; $str = preg_replace('/\s+/', '', $str); // 将会改变为'runoob' echo $str; ?>
执行结果如下所示:
runoob
<?php $string = 'The quick brown fox jumped over the lazy dog.'; $patterns = array(); $patterns[0] = '/quick/'; $patterns[1] = '/brown/'; $patterns[2] = '/fox/'; $replacements = array(); $replacements[2] = 'bear'; $replacements[1] = 'black'; $replacements[0] = 'slow'; echo preg_replace($patterns, $replacements, $string); ?>
执行结果如下所示:
The bear black slow jumped over the lazy dog.
<?php $count = 0; echo preg_replace(array('/\d/', '/\s/'), '*', 'xp 4 to', -1 , $count); echo $count; //3 ?>
执行结果如下所示:
xp***to 3
preg_replace 函数通过一个正则表达式分隔字符串。
array preg_split ( string $pattern , string $subject [, int $limit = -1 [, int $flags = 0 ]] )
通过一个正则表达式分隔给定字符串。
参数说明:
$pattern: 用于搜索的模式,字符串形式。
$subject: 输入字符串。
$limit: 可选,如果指定,将限制分隔得到的子串最多只有limit个,返回的最后一个 子串将包含所有剩余部分。limit值为-1, 0或null时都代表"不限制", 作为php的标准,你可以使用null跳过对flags的设置。
$flags: 可选,可以是任何下面标记的组合(以位或运算 | 组合):
PREG_SPLIT_NO_EMPTY: 如果这个标记被设置, preg_split() 将进返回分隔后的非空部分。
PREG_SPLIT_DELIM_CAPTURE: 如果这个标记设置了,用于分隔的模式中的括号表达式将被捕获并返回。
PREG_SPLIT_OFFSET_CAPTURE: 如果这个标记被设置, 对于每一个出现的匹配返回时将会附加字符串偏移量. 注意:这将会改变返回数组中的每一个元素, 使其每个元素成为一个由第0 个元素为分隔后的子串,第1个元素为该子串在subject 中的偏移量组成的数组。
返回一个使用 pattern 边界分隔 subject 后得到的子串组成的数组。
<?php //使用逗号或空格(包含" ", \r, \t, \n, \f)分隔短语 $keywords = preg_split("/[\s,]+/", "hypertext language, programming"); print_r($keywords); ?>
执行结果如下所示:
Array ( [0] => hypertext [1] => language [2] => programming )
<?php $str = 'runoob'; $chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY); print_r($chars); ?>
执行结果如下所示:
Array ( [0] => r [1] => u [2] => n [3] => o [4] => o [5] => b )
<?php $str = 'hypertext language programming'; $chars = preg_split('/ /', $str, -1, PREG_SPLIT_OFFSET_CAPTURE); print_r($chars); ?>
执行结果如下所示:
Array ( [0] => Array ( [0] => hypertext [1] => 0 ) [1] => Array ( [0] => language [1] => 10 ) [2] => Array ( [0] => programming [1] => 19 ) )
一、校验数字的表达式
1 数字:
^[0-9]*$
2 n位的数字:
^\d{n}$
3 至少n位的数字:
^\d{n,}$
4 m-n位的数字:
^\d{m,n}$
5 零和非零开头的数字:
^(0|[1-9][0-9]*)$
6 非零开头的最多带两位小数的数字:
^([1-9][0-9]*)+(.[0-9]{1,2})?$
7 带1-2位小数的正数或负数:
^(\-)?\d+(\.\d{1,2})?$
8 正数、负数、和小数:
^(\-|\+)?\d+(\.\d+)?$
9 有两位小数的正实数:
^[0-9]+(.[0-9]{2})?$
10 有1~3位小数的正实数:
^[0-9]+(.[0-9]{1,3})?$
11 非零的正整数:
^[1-9]\d*$ 或 ^([1-9][0-9]*){1,3}$ 或 ^\+?[1-9][0-9]*$
12 非零的负整数:
^\-[1-9][]0-9"*$ 或 ^-[1-9]\d*$
13 非负整数:
^\d+$ 或 ^[1-9]\d*|0$
14 非正整数:
^-[1-9]\d*|0$ 或 ^((-\d+)|(0+))$
15 非负浮点数:
^\d+(\.\d+)?$ 或 ^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$
16 非正浮点数:
^((-\d+(\.\d+)?)|(0+(\.0+)?))$ 或 ^(-([1-9]\d*\.\d*|0\.\d*[1-9]\d*))|0?\.0+|0$
17 正浮点数:
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$ 或 ^(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*))$
18 负浮点数:
^-([1-9]\d*\.\d*|0\.\d*[1-9]\d*)$ 或 ^(-(([0-9]+\.[0-9]*[1-9][0-9]*)|([0-9]*[1-9][0-9]*\.[0-9]+)|([0-9]*[1-9][0-9]*)))$
19 浮点数:
^(-?\d+)(\.\d+)?$ 或 ^-?([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0)$
二、校验字符的表达式
1 汉字:
^[\u4e00-\u9fa5]{0,}$
2 英文和数字:
^[A-Za-z0-9]+$ 或 ^[A-Za-z0-9]{4,40}$
3 长度为3-20的所有字符:
^.{3,20}$
4 由26个英文字母组成的字符串:
^[A-Za-z]+$
5 由26个大写英文字母组成的字符串:
^[A-Z]+$
6 由26个小写英文字母组成的字符串:
^[a-z]+$
7 由数字和26个英文字母组成的字符串:
^[A-Za-z0-9]+$
8 由数字、26个英文字母或者下划线组成的字符串:
^\w+$ 或 ^\w{3,20}$
9 中文、英文、数字包括下划线:
^[\u4E00-\u9FA5A-Za-z0-9_]+$
10 中文、英文、数字但不包括下划线等符号:
^[\u4E00-\u9FA5A-Za-z0-9]+$ 或 ^[\u4E00-\u9FA5A-Za-z0-9]{2,20}$
11 可以输入含有^%&',;=?$\"等字符:
[^%&',;=?$\x22]+
12 禁止输入含有~的字符:
[^~\x22]+
三、特殊需求表达式
1. Email address:
^\w ([- .]\w )*@\w ([-.]\w )*\.\w ([-.]\w )*$
2, domain name:
[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(/.[a-zA-Z0-9][-a-zA -Z0-9]{0,62}) /.?
3 , InternetURL:
[a-zA-z] ://[^\s]* or ^http://([\w -] \.) [\w-] (/[\w-./?%&=]*)?$
4, mobile phone number:
^(13[0-9]|14[5 |7]|15[0|1|2|3|5|6|7|8|9]|18[0|1|2|3|5|6|7|8|9])\d{8 }$
5, phone number ("XXX-XXXXXXX", "XXXX-XXXXXXXX", "XXX-XXXXXXX", "XXX-XXXXXXXX", "XXXXXXX" and "XXXXXXXX):
^(\(\ d{3,4}-)|\d{3.4}-)?\d{7,8}$
6 Domestic telephone number (0511-4405222, 021-87888822):
\d{3} -\d{8}|\d{4}-\d{7}
7, ID number:
15 or 18-digit ID number:
^\d{15}|\d{ 18}$
15-digit ID card:
^[1-9]\d{7}((0\d)|(1[0-2]))(([0|1|2] \d)|3[0-1])\d{3}$
18-digit ID card:
^[1-9]\d{5}[1-9]\d{3}( (0\d)|(1[0-2]))(([0|1|2]\d)|3[0-1])\d{4}$
8. Short ID number (Ending with numbers and letters x):
^([0-9]){7,18}(x|X)?$
or
^\d{8,18}|[0- 9x]{8,18}|[0-9X]{8,18}?$
9. Is the account legal (starting with a letter, 5-16 bytes allowed, alphanumeric underscores allowed):
^[ a-zA-Z][a-zA-Z0-9_]{4,15}$
10, password (starting with a letter, length between 6~18, can only contain letters, numbers and underscores):
^[a-zA-Z]\w{5,17}$
11, strong password (must contain a combination of uppercase and lowercase letters and numbers, special characters cannot be used, and the length is between 8-10) :
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,10}$
12. Date format:
^ \d{4}-\d{1,2}-\d{1,2}
13. 12 months of the year (01~09 and 1~12):
^(0?[ 1-9]|1[0-2])$
14, 31 days of a month (01~09 and 1~31):
^((0?[1-9])|(( 1|2)[0-9])|30|31)$
15. Input format of money:
16. 1. There are four representations of money that we can accept: "10000.00" and "10,000.00 ", and "10000" and "10,000" without "cent":
^[1-9][0-9]*$
17. 2. This means any number that does not start with 0, However, this also means that a character "0" is not passed, so we use the following form:
^(0|[1-9][0-9]*)$
18, 3. a 0 Or a number that does not start with 0. We can also allow a negative sign at the beginning:
^(0|-?[1-9][0-9]*)$
19, 4. This means A 0 or a number that may be negative and does not start with 0. Let the user start with 0. Also remove the negative sign, because money cannot be negative. What we need to add next is to explain the possible decimal part :
^[0-9] (.[0-9] )?$
20, 5. It must be noted that there should be at least 1 digit after the decimal point, so "10." is not passed , but "10" and "10.2" are passed:
^[0-9] (.[0-9]{2})?$
21. 6. In this way, we stipulate that there must be two decimal points after bit, if you think it is too harsh, you can do this:
^[0-9] (.[0-9]{1,2})?$
22. 7. This allows the user to write only one decimal places. Next we should consider commas in numbers. We can do this:
^[0-9]{1,3}(,[0-9]{3})*(.[0-9] {1,2})?$
23, 8.1 to 3 numbers, followed by any number of commas and 3 numbers, the commas become optional instead of required:
^([0-9] |[0 -9]{1,3}(,[0-9]{3})*)(.[0-9]{1,2})?$
24. Note: This is the final result, don’t Forget that " " can be replaced with "*" if you think an empty string is acceptable (strange, why?) Finally, don't forget to remove the backslash when using the function. Common mistakes are here
25, xml file:
^([a-zA-Z] -?) [a-zA-Z0-9] \\.[x|X][m|M][l|L]$
26. Regular expression of Chinese characters:
[\u4e00-\u9fa5]
27. Double-byte characters:
[^\x00-\xff]
(including Chinese characters) , can be used to calculate the length of a string (the length of a double-byte character is counted as 2, and the length of an ASCII character is counted as 1))
28. Regular expression for blank lines: \n\s*\r (can be used to delete blanks Line)
29, regular expression of HTML tag:
706b83c79d2c696ac46a98098db7b11b]*>.*?c0f8603dd44f0db5dcc943cf687721b3|<.*? /> ; (The version circulating on the Internet is too bad. The above one is only partially effective and is still powerless for complex nested tags)
30. Regular expression for leading and trailing whitespace characters: ^\s*|\s*$ or (^ \s*)|(\s*$) (can be used to delete whitespace characters at the beginning and end of the line (including spaces, tabs, form feeds, etc.), a very useful expression)
31, Tencent QQ number: [1-9][0-9]{4,} (Tencent QQ number starts from 10000)
32, China postal code: [1-9]\d{5}(?!\d) (China postal code is 6 digits)
33. IP address: \d \.\d \.\d \.\d (useful when extracting IP address)
The above is the detailed content of Content summary of PHP regularity (detailed). For more information, please follow other related articles on the PHP Chinese website!