收集了一份php正则表达式的实例教程,真心不错,记录下。
正则表达式用于字符串处理、表单验证等场合,实用高效。
一些常用的表达式:
$str = preg_replace("/(
其中用了三个子模式(每个圆括号中内容为一个子模式),第一个是链接开始标签,第二个是链接文本,第三个是
然后第二个参数中\1、\2、\3就表示这三个部分,要替换成什么样子还不简单?
获取页面中的所有链接地址的PHP函数
参考链接:
下面这个用PHP写的函数,可以获取任意的字符串$string中的所有链接地址($string可以是从一个HTML页面文件直接读取出来的字符串),结果保存在一个数组中返回.该函数自动把电子邮件地址排除在外,而且返回的数组中不会有重复元素.
function GetAllLink($string)
{
$string = str_replace("\r","",$string);
$string = str_replace("\n","",$string);
$regex[url] = "((http|https|ftp|telnet|news):\/\/)?([a-z0-9_\-\/\.]+\.[][a-z0-9:;@=_~%\?\/\.\,\+\-]+)";
$regex[email] = "([a-z0-9_\-]+)@([a-z0-9_\-]+\.[a-z0-9\-\._\-]+)";
//去掉标签之间的文字
$string = eregi_replace(">[^<>]+<","><", $string);
//去掉JAVASCRIPT代码
$string = eregi_replace("","", $string);
//去掉非的HTML标签
$string = eregi_replace("<[^a][^<>]*>","", $string);
//去掉EMAIL链接
$string = eregi_replace("]*>","", $string);
//替换需要的网页链接
$string = eregi_replace("]*>","\\3\t", $string);
$output[0] = strtok($string, "\t");
while(($temp = strtok("\t")))
{
if($temp && !in_array($temp, $output))
$output[++$i] = $temp;
}
return $output;
}
以下是以PHP的语法所写的示例
验证字符串是否只含数字与英文,字符串长度并在4~16个字符之间
$str = 'a1234';
if (preg_match("^[a-zA-Z0-9]{4,16}$", $str)) {
echo "验证成功";} else {
echo "验证失敗";}?>
简易的台湾身分证字号验证
$str = 'a1234';
if (preg_match("^(?:\d{15}|\d{18})$", $str)) {
echo "验证成功";
} else {
echo "验证失敗";}
?>
下面的代码实现文字中的代码块,功能就如你在脚本学堂看到的代码一样。
function codedisp($code) {
global $discuzcodes;
$discuzcodes['pcodecount']++;
$code = htmlspecialchars(str_replace('\\"', '"', preg_replace("/^[\n\r]*(.+?)[\n\r]*$/is", "\\1", $code)));
$discuzcodes['codehtml'][$discuzcodes['pcodecount']] = "
Regular expression to match Chinese characters: [u4e00-u9fa5]
Comment: Matching Chinese is really a headache, with this expression it will be easy to handle
Match double-byte characters (including Chinese characters Including): [^x00-xff]
Comment: Can be used to calculate the length of a string (a double-byte character is counted as 2, and an ASCII character is counted as 1)
Regular expression to match blank lines: ns *r
Comment: Can be used to delete blank lines
Regular expression matching HTML tags: <(S*?)[^>]*>.*?1>|< .*? />
Comment: The version circulating on the Internet is too bad. The above one can only match part of it, and it is still powerless for complex nested tags.
Regular expression for matching leading and trailing whitespace characters: ^s*| s*$
Comment: It can be used to delete whitespace characters at the beginning and end of the line (including spaces, tabs, form feeds, etc.). The very useful expression
matches the regular expression of the email address: w+([-+.]w+)*@w+([-.]w+)*.w+([-.]w+)*
Comment: Very useful for form validation
Regular expression to match URL :[a-zA-z]+://[^s]*
Comment: The version circulating on the Internet has very limited functions, and the above one can basically meet the needs
Is the matching account legal (starting with a letter, 5- is allowed 16 bytes, alphanumeric and underscore allowed): ^[a-zA-Z][a-zA-Z0-9_]{4,15}$
Comment: Very useful for form validation
Matching domestic phone numbers :d{3}-d{8}|d{4}-d{7}
Comment: Matching format is such as 0511-4405222 or 021-87888822
Matching Tencent QQ number: [1-9][0 -9]{4,}
Comment: Tencent QQ number starts from 10000
Matching Chinese postal code: [1-9]d{5}(?!d)
Comment: China postal code is 6 Digits
Matching ID card: d{15}|d{18}
Comment: China’s ID card is 15 or 18 digits
Matching IP address: d+.d+.d+.d+
Comment: Useful when extracting IP addresses
(edited by Script Academy www.jbxue.com) Match specific numbers:
^[1-9]d*$ // Match positive integers
^-[1- 9]d*$ // Match negative integers
^-?[1-9]d*$ // Match integers
^[1-9]d*|0$ // Match non-negative integers (positive Integer + 0)
^-[1-9]d*|0$ // Match non-positive integer (negative integer + 0)
^[1-9]d*.d*|0.d* [1-9]d*$ // Match positive floating point numbers
^-([1-9]d*.d*|0.d*[1-9]d*)$ // Match negative floating point numbers
^-?([1-9]d*.d*|0.d*[1-9]d*|0?.0+|0)$ //Match floating point number
^[1 -9]d*.d*|0.d*[1-9]d*|0?.0+|0$ // Match non-negative floating point numbers (positive floating point numbers + 0)
^(-( [1-9]d*.d*|0.d*[1-9]d*))|0?.0+|0$ //Match non-positive floating point numbers (negative floating point numbers + 0)
Comment: Useful when processing a large amount of data. Please pay attention to correcting
to match a specific string in specific applications:
^[A-Za-z]+$ //Match a string consisting of 26 English letters
^ [A-Z]+$ // Matches a string consisting of 26 uppercase English letters
^[a-z]+$ // Matches a string consisting of 26 lowercase English letters
^[A-Za -z0-9]+$ // Matches a string consisting of numbers and 26 English letters
^w+$ // Matches a string consisting of numbers, 26 English letters or underscores
The following are some special characters:
Special characters in regular expressions: (Learning reference book-<<Mastering regular expressions>>)
Character
Meaning: For characters, it usually means literally Meaning, indicating that the following characters are special characters without explanation.
For example: /b/ matches the character 'b'. By adding a backslash in front of b, that is, /b/, the character becomes a special character, indicating that
matches the dividing line of a word.
or:
For several characters, it is usually stated that they are special, indicating that the following characters are not special and should be interpreted literally.
For example: * is a special character, matching any number of characters (including 0 characters); for example: /a*/ means matching 0 or more a.
To match a literal *, add a backslash before a; for example: /a*/ matches 'a*'.
Character^
Meaning: Indicates that the matching character must be at the front.
For example: /^A/ does not match the 'A' in "an A," but matches the first 'A' in "An A.".
Character$
Meaning: Similar to ^, matches the last character.
For example: /t$/ does not match the 't' in "eater", but does match the 't' in "eat".
Character*
Meaning: Match the character before * 0 or n times.
For example: /bo*/ matches the 'boooo' in "A ghost booooed" or the 'b' in "A bird warbled", but does not match any characters in "A goat g
runted".
Character +
Meaning: Match the character before the + sign 1 or n times. Equivalent to {1,}.
For example: /a+/ matches the 'a' in "candy" and all the 'a's in "caaaaaaandy."
Character?
Meaning: Match the character before ? 0 or 1 times.
For example: /e?le?/ matches the 'el' in "angel" and the 'le' in "angle.".
character.
meaning: (decimal point) matches all single characters except newline characters.
For example: /.n/ matches 'an' and 'on' in "nay, an apple is on the tree", but does not match 'nay'.
Character (x)
Meaning: Match 'x' and record the matching value.
For example: /(foo)/ matches and records 'foo' in "foo bar.". The matching substring can be returned by the elements [1], ..., [n] in the result array, or by the properties $1, ..., $9 of the RegExp object.
Meaning: Match 'x' or 'y'.
For example: /green|red/ matches the 'green' in "green apple" and the 'red' in "red apple."
Meaning: n here is a positive integer. Matches the first n characters.
For example: /a{2}/ does not match the 'a' in "candy," but matches all the 'a's in "caandy," and the first two
'a in "caaandy." '.
Meaning: n here is a positive integer. Matches at least n previous characters.
For example: /a{2,} does not match the 'a' in "candy", but matches all the 'a's in "caandy" and all the 'a's in "caaaaaaandy."
Meaning: n and m here are both positive integers. Matches at least n and at most m previous characters.
For example: /a{1,3}/ does not match any character in "cndy", but matches the 'a' in "candy," and the first two
'a' in "caandy," And the first three 'a's in "caaaaaaandy", note: even if there are many 'a's in "caaaaaaandy", it only matches the first three
'a's, that is, "aaa".
Meaning: A list of characters, matching any character in the list. You can specify a range of characters using the hyphen -.
For example: [abcd] is the same as [a-c]. They match the 'b' in "brisket" and the 'c' in "ache".
Meaning: One-character complement, that is, it matches everything except the listed characters. You can use hyphens - to indicate a
character range.
For example: [^abc] and [^a-c] are equivalent, they first match the 'r' in "brisket" and the 'h' in "chop.".
meaning: matches a space (not to be confused with b)
Meaning: Match the dividing line of a word, such as a space (not to be confused with)
For example: /bnw/ matches the 'no' in "noonday", /wyb/ matches "possibly yesterday" 'ly' in .".
Meaning: Match the non-breaking line of a word
For example: /wBn/ matches the 'on' in "noonday", /yBw/ matches the 'ye' in "possibly yesterday."
Meaning: X here is a control character. Matches a string of control characters.
For example: /cM/ matches control-M in a string.
Meaning: Match a number, equivalent to [0-9].
For example: /d/ or /[0-9]/ matches the '2' in "B2 is the suite number.".
Meaning: Matches any non-number, equivalent to [^0-9].
For example: /D/ or /[^0-9]/ matches the 'B' in "B2 is the suite number.".
Meaning: Match a form character
Meaning: Match a newline character
Meaning: Match a carriage return character
Meaning: Matches a single white space character, including space, tab, form feed, newline character, equivalent to [fnrtv].
For example: /sw*/ matches 'bar' in "foo bar.".
Meaning: Matches a single character except white space, equivalent to [^ fnrtv].
For example: /S/w* matches 'foo' in "foo bar.".
Meaning: Match a tab character
Meaning: Matches a leading tab character
Meaning: Matches all numbers, letters and underscores, equivalent to [A-Za-z0-9_].
For example: /w/ matches the 'a' in "apple,", the '5' in "$5.28," and the '3' in "3D.".
Meaning: Matches other characters except numbers, letters and underscores, equivalent to [^A-Za-z0-9_].
For example: /W/ or /[^$A-Za-z0-9_]/ matches the '%' in "50%.".
Meaning: n here is a positive integer. The value of n that matches the last substring of a regular expression (counting left parentheses).
there is a more complete example.
Note: If the number in the left parenthesis is smaller than the number specified by n, n takes the octal escape of the next line as the description.
Meaning: ooctal here is an octal escape value, and xhex is a hexadecimal escape value, allowing ASCII codes to be embedded in a regular expression.
Delimiter, usually "/" is used as the beginning and end of the delimiter, but "#" can also be used.
When should you use "#"? Usually when there are many "/" characters in your string, because such characters need to be escaped during regular expressions, such as uri.
The code using the "/" delimiter is as follows.
$regex = '/^http://([w.]+)/([w]+)/( [w]+).html$/i';
$str = 'http://www.jbxue.com/show_page/id_ABCDEFG.html';
$matches = array();
if (preg_match($regex, $str, $matches)){
var_dump($matches);
}
echo "n";
$matches[0] in preg_match will contain the string matching the entire pattern.
The code using the "#" delimiter is as follows. At this time, "/" is not escaped!
$regex = '#^http://([w.]+)/([w] +)/([w]+).html$#i';
$str = 'http://www.jbxue.com/show_page/id_ABCDEFG.html';
$matches = array();
if(preg_match($regex, $str, $matches)){
var_dump($matches);
}
echo "n";
Modifier: used to change the behavior of regular expressions.
The last "i" in ('/^http://([w.]+)/([w]+)/([w]+).html/i') we see is Modifier means ignoring case, and another one we often use is "x" which means ignoring spaces.
Code:
$regex = '/HELLO/';
$str = 'hello word';
$matches = array();
if(preg_match($regex, $str , $matches)){
echo 'No i:Valid Successful!',"n";
}
if(preg_match($regex.'i', $str, $matches)){
echo 'YES i:Valid Successful!',"n";
}
Character field:[w] The part expanded with square brackets is the character field.
Qualifier: Such as [w]{3,5} or [w]* or [w]+. The symbols after [w] all represent qualifiers. The specific meaning is now introduced.
{3,5} means 3 to 5 characters. {3,} is more than 3 characters, {,5} is up to 5 characters, and {3} is three characters.
* means 0 or more.
+ means 1 or more.
Caret
^:
> placed in a character field (such as: [^w]) to express negation (meaning not to include) - "reverse selection"
> placed in Before the expression, it means starting with the current character. (/^n/i, means starting with n).
Note that we often call "" "escape character". Used to escape some special symbols, such as ".", "/"
delimiter: The form of the regular expression is generally as follows:
/love/
where between the "/" delimiters The part is the pattern that will be matched in the target object.
Metacharacters: refer to those special characters with special meaning in regular expressions, which can be used to specify the appearance pattern of their leading characters (i.e., the characters in front of the metacharacters) in the target object.
The more commonly used metacharacters include: "+", "*", and "?".
The "+" metacharacter specifies that its leading character must appear one or more times in the target object
The "*" metacharacter specifies that its leading character must appear zero times or multiple times in a row in the target object,
The "?" metacharacter stipulates that its leading character must appear zero or once in the target object.
Now, let’s take a look at the specific applications of regular expression metacharacters.
/fo+/
Because the above regular expression contains the "+" metacharacter (the "o" in front of it is the leading character), it means that it can be combined with "fool", "fo", etc. in the target object. Matches strings in which one or more letters o appear consecutively after f.
In addition to metacharacters, users can specify exactly how often a pattern appears in a matched object. For example,
/jim{2,6}/
The above regular expression stipulates that the character m can appear 2-6 times in a row in the matching object. Therefore, the above regular expression can be compared with strings such as jimmy or jimmmmmy. match.
How to use several other important metacharacters.
s: used to match a single space character, including tab keys and newlines;
S: used to match all characters except a single space character;
d: used to match from 0 to 9 Numbers;
w: used to match letters, numbers, or underscore characters;
W: used to match all characters that do not match w;
.: used to match all characters except newline characters.
(Note: We can think of s and S and w and W as inverse operations of each other)
Below, we will take a look at how to use the above metacharacters in regular expressions through examples.
/s+/
The above regular expression can be used to match one or more space characters in the target object.
In addition to the metacharacters we introduced above, regular expressions also have another unique special character, namely the locator.
Locator: used to specify where the matching pattern appears in the target object.
Commonly used locators include: "^", "$", "b" and "B".
The "^" locator specifies that the matching pattern must appear at the beginning of the target string
The "$" locator specifies that the matching pattern must appear at the end of the target object
The b locator specifies that the matching pattern must appear at the end of the target string One of the two boundaries at the beginning or end of the string
The "B" locator specifies that the matching object must be located within the two boundaries of the beginning and end of the target string, that is, the matching object cannot be the beginning of the target string. , nor as the end of the target string. Similarly, we
can also regard "^" and "$" and "b" and "B" as two sets of locators that are inverse operations of each other. For example:
/^hell/
Because the above regular expression contains the "^" locator, it can be compared with the string starting with "hell", "hello" or "hellhound" in the target object. match.
/ar$/
Because the above regular expression contains the "$" locator, it can match strings ending with "car", "bar" or "ar" in the target object.
/bbom/
Because the above regular expression pattern starts with the "b" locator, it can match the string starting with "bomb", or "bom" in the target object.
/manb/
Because the above regular expression pattern ends with the "b" locator, it can match strings ending with "human", "woman" or "man" in the target object.
In order to facilitate users to set matching patterns more flexibly, regular expressions allow users to specify a certain range in the matching pattern without being limited to specific characters. For example:
/[A-Z]/
The above regular expression will match any uppercase letter from A to Z.
/[a-z]/
The above regular expression will match any lowercase letter in the range from a to z.
/[0-9]/
The above regular expression will match any number from 0 to 9.
/([a-z][A-Z][0-9])+/
The above regular expression will match any string consisting of letters and numbers, such as "aB0", etc. One thing that users need to pay attention to here is that you can use "()" in regular expressions to combine strings together.
"()" symbol: The contained content must appear in the target object at the same time. Therefore, the above regular expression will not match a string such as "abc" because the last character in "abc" is a letter and not a number.
If we want to implement a regular expression similar to the "OR" operation in programming logic and select any one of multiple different patterns for matching, we can use the pipe character: "|". For example:
/to|too|2/
The above regular expression will match "to", "too", or "2" in the target object.
Negative character: "[^]". Different from the locator "^" we introduced earlier, the negation character "[^]" specifies that the string specified in the pattern cannot exist in the target object. For example:
/[^A-C]/
The above string will match any character except A, B, and C in the target object. Generally speaking, when "^" appears inside "[]", it is regarded as a negative operator; when "^" is located outside "[]", or there is no "[]", it should be regarded as a negative operator. locator.
Finally, when users need to add metacharacters to the regular expression pattern and find their matching objects, they can use the
escape character: "". For example:
/Th*/
The above regular expression will match "Th*" instead of "The" etc. in the target object.
Introduction to practical experience
We still have to talk about ^ and $. They are used to match the beginning and end of a string respectively. The following are examples:
"^The": There must be a "The" character at the beginning. String;
"of despair$": The string must have "of despair" at the end;
Then,
"^abc$": It is the string that starts with abc and ends with abc, In fact, only abc matches;
"notice": matches the string containing notice;
You can see that if you do not use the two characters we mentioned (the last example), that is, the pattern (regular expression (formula) can appear anywhere in the string being tested, as long as you don't lock it to both sides.
Next, let's talk about '*' '+' and '?'
They are used to indicate the number or order of occurrences of a character. They respectively represent:
"zero or more" is equivalent to {0, }
"one or more" is equivalent to {1,}
"zero or one." is equivalent to {0,1}
Here are some examples:
"ab*": and ab{ 0,} is synonymous, matching starts with a and can be followed by a string consisting of 0 or N b ("a", "ab", "abbb", etc.);
"ab+": and ab{ 1,} is synonymous with the above article, but there must be at least one b ("ab" "abbb", etc.);
"ab?": synonymous with ab{0,1}, there can be none or only one b;
"a?b+$": Matches a string ending with one or 0 a plus one or more b.
Key points: '*' '+' and '?' only care about the character before it.
You can also limit the number of characters appearing in curly brackets, for example:
"ab{2}": It is required that a must be followed by two b (no less) ("abb");
"ab{2,}": It is required that a must be followed by two or more b (such as "abb" "abbbb", etc.);
"ab{3,5}": It is required that a be followed There can be 2-5 b ("abbb", "abbbb", or "abbbbb").
Now we put certain characters into parentheses, for example:
"a(bc)*": matches a followed by 0 or one "bc";
"a(bc){ 1,5}": one to five "bc";
There is also a character '|', which is equivalent to an OR operation:
"hi|hello": matches characters containing "hi" or "hello" String;
"(b|cd)ef": matches a string containing "bef" or "cdef";
"(a|b)*c": matches a string containing multiple of these (including 0) a or b, followed by a string of c;
A dot ('.') can represent all single characters, excluding " "
If you want to match all single characters including " ", what to do?
Use the '[ .]' pattern.
“a.[0-9]”: an a plus a character plus a number from 0 to 9;
“^.{3}$”: ends with three arbitrary characters.
The content enclosed in square brackets only matches a single character
"[ab]": matches a single a or b (same as "a│b");
"[a-d]": matches a single character from 'a' to 'd' (same effect as "a│b│c│d" and "[abcd]");
Generally we use [a-zA -Z] to specify a character in uppercase and lowercase English:
“^[a-zA-Z]”: matches a string starting with an uppercase or lowercase letter;
“[0-9]%”: matches a string containing A string in the form of Characters are listed in square brackets. You just need to use '^' as the beginning of the bracket. "%[^a-zA-Z]%" matches a string containing two percent signs with a non-letter inside.
Points: When ^ is used at the beginning of square brackets, it means to exclude the characters in the brackets.
In order for PHP to interpret it, you must add "" before and after these characters, and escape some characters.
Don't forget that characters inside brackets are exceptions to this rule - inside brackets, all Special characters, including ("), will lose their special properties "[*+?{}.]" matches strings containing these characters:
Also, as the regx manual tells us: "If in the list If it contains ']', it is best to put it as the first character in the list (maybe after '^'). If it contains '-', it is best to put it at the front or last
, or or. The '-' in the middle of the second end point of a range [a-d-0-9] will be valid.
After reading the above example, you should understand that {n,m} should be noted that n. Neither m nor n can be negative integers, and n is always less than m. In this way, it can be matched at least n times and at most m times. For example, "p{1,5}" will match the first five in
"pvpppppp". p
Let’s talk about words starting with
b. The book says that it is used to match a word boundary, that is...for example, 'veb', which can match ve in love but not ve in very
B Just the opposite of b above.
Other uses of regular expressions
Extracting strings
ereg() and eregi() have a feature that allows users to extract part of a string through regular expressions (specifically You can read the manual for usage). For example, if we want to extract the file name from path/URL, the following code is what you need:
ereg(”([^/]*)$”, $pathOrUrl, $regs);
echo $regs[1];
Advanced substitutions
ereg_replace() and eregi_replace() are also very useful, if we want to replace all separated negative signs with commas:
ereg_replace("[ t]+", ",", trim($str));
preg_match() and preg_match_all()
preg_quote()
preg_split()
preg_grep()
preg_replace()
The specific use of the function can be found through the PHP manual. Here are some regular expressions we have accumulated:
Match action attribute
$match = '';
preg_match_all('/s+action="(?!http:)(.*?)"s/', $str, $match);
print_r($match);
Use callback functions in regular expressions
* replace some string by callback function
**/
function callback_replace() {
$url = 'http://esfang.house.sina.com.cn';
$str = '';
$str = preg_replace ( '/(?<=saction=")(?!http:)(.*?)(?="s)/e', 'search($url, 1)', $str );
echo $str;
}
function search($url, $match){
return $url . '/' . $match;
}
Regular matching with assertions
$match = '';
paragraph text
';
preg_match_all ( '/(? <=<(w{1})>).*(?=1>)/', $str, $match );
echo "Match the content in HTML tags without attributes: ";
print_r ( $match );
Replace the address in the HTML source code
$form_html = preg_replace ( '/(?<=saction="|ssrc="|shref=")(?!http:|javascript)(.*?)( ?="s)/e', 'add_url($url, '1')', $form_html );
Metacharacters
In the above example, symbols such as ^, d and $ represent specific matching meanings, which we call metacharacters. Commonly used metacharacters are as follows:
. Matches any character except newlines
w matches letters or numbers or underscores
s matches any whitespace character
d matches numbers
b matches the beginning or end of a word
^ Match the beginning of the string
$ Match the end of the string
[x] Match x characters, such as matching a, b and c characters in the string
W The opposite of w, that is, match any non-letter , numbers, underscores and Chinese characters
S The antonym of s, which matches any non-whitespace character
D The antonym of d, which matches any non-digit character
B The antonym of b, That is, it is not the beginning or end of the word
[^x] matches any character except x, such as [^abc] matches any character except the letters abc
For reference only: O(∩_∩)O~
1./\bKevin\b Chang\b/
2./.{6,}/
3./.{1,6} / At least 1 byte, at most 6 bytes
4./^[a-z][0-9]*$/i //Note the exact match of ^&
5./\w+([- +.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*/
The code is as follows:
$str=file_get_contents('abc.com/aaa.php');
if (preg_match('||', $str, $reg)) $out=$reg[1];
else $out='';
echo "$out
\n";
?>