About the use of RegExp objects and brackets in JS regular expressions

不言
Release: 2018-06-30 13:49:57
Original
1569 people have browsed it

The following is a brief discussion on the use of RegExp objects and brackets in JS regular expressions. The content is quite good, so I will share it with you now and give it as a reference.

Creation of RegExp objects:

Conventional regular expressions can be created with direct quantities, that is, characters enclosed by slashes "/" . But in an environment that requires parameter changes, the RegExp() constructor is a better choice:

var reg1 = /'\w '/g;
var reg2 = new RegExp(' \'\\w \'','g');

Comparing the two creation methods, the first parameter in RegExp is the regular string to be created. On the one hand, pay attention, because it is not The representation of a direct quantity, so there is no need to enclose it with slashes "/"; instead, the quotation marks "'" and the escape symbol "\" must be escaped twice in the string.

In addition, whether it is a direct variable or the RegExp() constructor, a new RegExp object is generated and assigned to a variable.

Similarities and differences between match() and exec():

match and exec are common methods for matching strings with regular expressions. The functions implemented by the two are similar, with some subtle differences:

1. Usage

match is a method of string packaging objects, usage :String.match(RegExp);

exec is a method of regular expression object, usage: RegExp.exec(String);

2. Returned result

When RegExp does not set the global flag "g":

The return results of both are the same. That is, null is returned when there is no matching value, and an array (let array) is returned when there is a matching value. array[0] is the matched string, array[1], array[2]... correspond to the substrings $1, $2... matched by parentheses in the regular expression. At the same time, the array has two attributes, array.index represents the initial position of the matching string, and array.input represents the string being retrieved.

When RegExp has the global flag "g" set:

match returns an array array when there is a value. Each item in the array represents all the matched strings in turn, so there are no more substrings matched by parentheses. At this time, the array has no index attribute and input attribute.

exec behaves the same as without the global flag "g". What is returned at this time is an array array, array[0] is the currently matched string, array[1], array[2]...are the strings matched by the parentheses under the current match. At this time, pay attention to the lastIndex attribute of the RegExp object, which represents the position after the end of the matched string in the original string. When there are no further matching results, the lastIndex attribute is set to 0. Therefore, you can use the lastIndex loop to find all matching strings.

Support multiple matching methods:

js code

var testStr = "now test001 test002"; var re = /test(\d+)/ig; var r = ""; while(r = re.exec(testStr)) { alert(r[0] + " " + r[1]); }
Copy after login

In addition, you can also use testStr.match(re), but in this case, you cannot have the g option, and you can only get the first match.

1. Regular expression rules

1.1 Ordinary characters

Letters, numbers, Chinese characters, underscores, and punctuation marks not specifically defined in the following chapters are all "ordinary characters". Ordinary characters in an expression, when matching a string, match the same character.

Example 1: When the expression "c" matches the string "abcde", the matching result is: success; the matched content is: "c"; the matched position is: starting at 2, Ended at 3. (Note: Whether the subscript starts from 0 or 1 may differ depending on the current programming language)

Example 2: The expression "bcd", when matching the string "abcde", the matching result is : Success; the matched content is: "bcd"; the matched position is: starting at 1 and ending at 4.

1.2 Simple escape characters

For some characters that are inconvenient to write, use the method of adding "/" in front. In fact, we are all familiar with these characters.

Expression

can match

/r, /n

represents carriage return and line feed characters

/t

Tab character

//

represents "/" itself

There are other punctuation marks that have special uses in later chapters. Adding "/" in front of them represents the symbol itself. For example: ^ and $ have special meanings. If you want to match the "^" and "$" characters in the string, the expression needs to be written as "/^" and "/$".

##/. The matching method of these escape characters is similar to "normal characters". Also matches the same character.

Expression

can match

/^

Matches the ^ symbol itself

/$

##Match the $ symbol itself

##Match the decimal point (.) itself

Example 1: The expression "/$d", when matching the string "abc$de", the matching result is: success; the matched content is: "$d"; the matched position is : Starts at 3 and ends at 5.

1.3 Expressions that can match 'multiple characters'

Some expression methods in regular expressions can match any one of 'multiple characters' character. For example, the expression "/d" can match any number. Although it can match any of the characters, it can only be one, not multiple. This is like playing poker. The king and king can replace any card, but they can only replace one card.

Expression /d /w /s . Example 1: When the expression "/d/d" matches "abc123", the matching result is: success; the matched content is: "12"; the matched position is: starting at 3 and ending at 5. Example 2: When the expression "a./d" matches "aaa100", the matching result is: success; the matched content is: "aa1"; the matched position is: starting at 1, ended in 4.

can match

Any number, any one from 0 to 9

Any letter, number or underscore, that is, any one of A~Z,a~z,0~9,_

##Including any one of the whitespace characters such as spaces, tabs, form feeds, etc.

The decimal point can match any character except the newline character (/n)

1.4 Custom expressions that can match 'multiple characters'

Use square brackets [ ] to include a series of characters that can match any one of them. Use [^] to include a series of characters to match any character except those characters. In the same way, although any one of them can be matched, it can only be one, not multiple.

Expression [ab5@] [^abc] [f-k] ## Matches any character except "A"~"F","0"~"3"

can match

Match "a" or "b" or "5" or "@"

Matches any character except "a", "b", "c"

##Match any letter between "f"~"k"

[^A-F0-3]

Example 1: When the expression "[bcd][bcd]" matches "abc123", the matching result is: success; the matched content is: "bc"; the matched position is: starting at 1, ending At 3.

Example 2: When the expression "[^abc]" matches "abc123", the matching result is: success; the matched content is: "1"; the matched position is: starting at 3, Ended at 4.

1.5 Special symbols that modify the number of matches

The expressions mentioned in the previous chapter, whether they can only match one type of character, or can match multiple An expression containing any one of these characters can only be matched once. If you use an expression plus a special symbol that modifies the number of matches, you can match repeatedly without writing the expression again.

The usage method is: put the "number of times modification" after the "modified expression". For example: "[bcd][bcd]" can be written as "[bcd]{2}".

##* The expression does not appear or appears any number of times, equivalent to {0,}, for example: "/^*b" can match "b", "^^^b"... Example 1: When the expression "/d /.?/d*" matches "It costs $12.5", the matching result is: success; the matched content is: "12.5"; the matched position is: starting at 10 and ending at 14.

Expression

Function

{n}

The expression is repeated n times, for example: "/w{2}" is equivalent to "/w/w"; "a{5}" is equivalent to "aaaaa "

{m,n}

The expression is repeated at least m times and at most n times, for example: "ba{1,3}" can match "ba" or "baa" or "baaa"

##{m,}

The expression is repeated at least m times, for example: "/w/d{2,}" can match "a12", "_456", "M12344"...

?

Matches the expression 0 or 1 times, equivalent to {0,1}, for example: "a[cd]?" can match" a","ac","ad"

The expression appears at least once, which is equivalent to { 1,}, for example: "a b" can match "ab", "aab", "aaab"...

Example 2: When the expression "go{2,8}gle" matches "Ads by goooooogle", the matching result is: success; the matched content is: "goooooogle"; the matched position Yes: starts at 7 and ends at 17.

1.6 Some other special symbols representing abstract meanings

Some symbols represent abstract special meanings in expressions:

Further text description is still relatively abstract, so examples are provided to help everyone understand.

Example 1: When the expression "^aaa" matches "xxx aaa xxx", the matching result is: failure. Because "^" is required to match the beginning of the string, "^aaa" can only match when "aaa" is at the beginning of the string, such as: "aaa xxx xxx".

Example 2: When the expression "aaa$" matches "xxx aaa xxx", the matching result is: failure. Because "$" is required to match the end of the string, "aaa$" can only match when "aaa" is at the end of the string, such as: "xxx xxx aaa".

Example 3: When the expression "./b." matches "@@@abc", the matching result is: success; the matched content is: "@a"; the matched position is: Starts at 2 and ends at 4.
Further explanation: "/b" is similar to "^" and "$". It does not match any character itself, but it requires it to be on the left and right sides of its position in the matching result, one of which is the "/w" range. The other side is the non-"/w" range.

Example 4: When the expression "/bend/b" matches "weekend,endfor,end", the matching result is: success; the matched content is: "end"; the matched position is: Starts at 15 and ends at 18.

Some symbols can affect the relationship between subexpressions within an expression:

Expression Function ##^ $ /b

matches the beginning of the string, does not match any characters

matches the end of the string Matches anywhere, does not match any characters

matches a word boundary, that is, between a word and a space position between, does not match any characters

Expression

Function

|

Left and right sides "OR" relationship between expressions, matching left or right

( )

(1). In When the number of matches is modified, the expression in parentheses can be modified as a whole
(2). When fetching the matching result, the content matched by the expression in brackets can be obtained individually

Example 5: Expression "Tom|Jack " When matching the string "I'm Tom, he is Jack", the matching result is: success; the matched content is: "Tom"; the matched position is: starting at 4 and ending at 7. When matching the next one, the matching result is: success; the matched content is: "Jack"; the matched position: starts at 15 and ends at 19.

Example 6: When the expression "(go/s*) " matches "Let's go go go!", the matching result is: success; the matched content is: "go go go"; the matched The positions are: starts at 6 and ends at 14.

Example 7: When the expression "¥(/d /.?/d*)" matches "$10.9,¥20.5", the matching result is: success; the matched content is: "¥ 20.5"; the matched position is: starting at 6 and ending at 10. The content matched by obtaining the bracket range alone is: "20.5".

2. Some advanced rules in regular expressions

2.1 Greedy and non-greedy in the number of matches

When using special symbols that modify the number of matches, there are several expression methods that can make the same expression match different times, such as: "{m,n}", "{m,}", " ?", "*", " ", the specific number of matches depends on the matched string. This kind of repeated matching expression an indefinite number of times always matches as many times as possible during the matching process. For example, for the text "dxxxdxxxd", the example is as follows:

Expression

Matching result

(d)(/w )

"/w " will match all characters after the first "d" "xxxdxxxd "

(d)(/w )(d)

"/w " will match the first All characters "xxxdxxx" between "d" and the last "d". Although "/w" can also match the last "d", in order to make the entire expression match successfully, "/w" can "give up" the last "d" it could have matched

It can be seen that "/w" always matches as many characters as possible that match its rules when matching. Although in the second example, it does not match the last "d", that is so that the entire expression can be matched successfully. In the same way, expressions with "*" and "{m,n}" are matched as much as possible, and expressions with "?" are also "matched" as much as possible when they can be matched or not. This matching principle is called the "greedy" mode.

Non-greedy mode:

Add a "?" sign after the special symbol that modifies the number of matches to make the expression with an indefinite number of matches as possible as possible Fewer matches, so that expressions that can match or not match can be "unmatched" as much as possible. This matching principle is called "non-greedy" mode, also called "reluctant" mode. If there are fewer matches, the entire expression will fail to match. Similar to the greedy mode, the non-greedy mode will minimally match more to make the entire expression match successfully. For example, for the text "dxxxdxxxd":

More situations, examples are as follows:

Example 1: Expression "

Example 2: In contrast, when the expression "

2.2 Back reference /1, /2...

When the expression is matched, the expression engine will match the expression contained in parentheses "( )" Record the string matched by the formula. When obtaining the matching result, the string matched by the expression contained in parentheses can be obtained separately. This has been demonstrated many times in the previous examples. In practical applications, when a certain boundary is used to search and the content to be obtained does not include the boundary, parentheses must be used to specify the desired range. For example, the previous "

In fact, "the string matched by the expression contained in parentheses" can not only be used after the matching is completed, but can also be used during the matching process. The part after the expression can refer to the previous "submatch in brackets that has already matched the string". The reference method is "/" plus a number. "/1" refers to the string matched in the first pair of brackets, "/2" refers to the string matched in the second pair of brackets...and so on. If a pair of brackets contains another pair of brackets, then the outer pair The layer's parentheses are sorted first. In other words, which pair of left brackets "(" comes first, then this pair will be sorted first.

For example:

Example 1: Expression "('|" )(.*?)(/1)" When matching " 'Hello', "World" ", the matching result is: success; the matched content is: " 'Hello' ". When matching the next one again, it can match to " "World" ".

Example 2: When the expression "(/w)/1{4,}" matches "aa bbbb abcdefg ccccc 111121111 999999999", the matching result is: success; matched The content is "ccccc". When matching the next one again, you will get 999999999. This expression requires the characters in the "/w" range to be repeated at least 5 times. Note the difference with "/w{5,}".

Example 3: Expression "<(/w )/s*(/w (=('|").*?/4)?/s*)*>.*? " When matching "

2.3 Pre-search, no match; Reverse pre-search, no match

In the previous chapter, I talked about several special symbols that represent abstract meanings: "^", "$", "/b". They all have one thing in common, that is: they do not match any characters themselves, just A condition is attached to "both ends of the string" or "the gap between characters". After understanding this concept, this section will continue to introduce another, more flexible expression that attaches conditions to "both ends" or "the gap" Method.

Forward pre-search: "(?=xxxxx)", "(?!xxxxx)"

Format: "(?=xxxxx)", in the matched string , it attaches a condition to the "gap" or "both ends" where it is located: the right side of the gap must be able to match the expression of xxxxx. Because it is only used as an additional condition on this gap, so It does not affect the following expressions to actually match the characters after this gap. This is similar to "/b", which does not match any characters. "/b" just takes the characters before and after the gap and makes a judgment. , will not affect the subsequent expressions to achieve true matching.

Example 1: When the expression "Windows (?=NT|XP)" matches "Windows 98, Windows NT, Windows 2000", it will only Matches "Windows" in "Windows NT", and other "Windows" words are not matched.

Example 2: The expression "(/w)((?=/1/1/1)(/1)) " will match 6 "f" when matching the string "aaa ffffff 999999999" The first 4 can match the first 7 of 9 "9"s. This expression can be read as: if letters and numbers are repeated more than 4 times, then the part before the last 2 digits is matched. Of course, this expression does not need to be written like this, but it is only used for demonstration purposes.

Format: "(?!xxxxx)", on the right side of the gap, must not match the xxxxx part of the expression.

Example 3: When the expression "((?!/bstop/b).) " matches "fdjka ljfdl stop fjdsla fdj", it will match from the beginning to the position before "stop". If the string If there is no "stop" in it, the entire string is matched.

Example 4: When the expression "do(?!/w)" matches the string "done, do, dog", it can only match "do". In this example, using "(?!/w)" after "do" has the same effect as using "/b".

Reverse pre-search: "(?<=xxxxx)", "(?

Example 5: When the expression "(?<=/d{4})/d (?=/d{4})" matches "1234567890123456", it will match except the first 4 numbers and The middle 8 numbers besides the last 4 numbers. Since JScript.RegExp does not support reverse pre-search, this example cannot be demonstrated. Many other engines can support reverse pre-search, such as the java.util.regex package in Java 1.4 and above, the System.Text.RegularExpressions namespace in .NET, and the simplest and easiest-to-use DEELX regular engine recommended by this site.

3. Other general rules

There are also some common rules between various regular expression engines. In the previous explanation process Not mentioned in .

3.1 In expressions, you can use "/xXX" and "/uXXXX" to represent a character ("X" represents a hexadecimal number)

" and string " " When matching, the matching result is: success; the matched content is " " the entire string, the "" in the expression will be the same as the string The last "" matches. " matches the same string in Example 1, only " ", when matching the next one again, you can get the second " ". ". ", the matching result is successful. If " " If not matched, the match will fail; if changed to other matches, the match can also be successful.

The concepts of these two formats are similar to forward pre-search. The conditions required for reverse pre-search are: the "left side" of the gap. The two formats respectively require that they must be able to match and must not be able to match the specified expression, instead of judging the right side. The same thing as "forward pre-search" is that they are additional conditions for the gap and do not match any characters themselves.

Expression

Matching result

(d)(/w ?)

"/w ?" will match as little as possible after the first "d" characters, the result is: "/w ?" only matches one "x"

##(d)(/w ?)(d)

In order for the entire expression to match successfully, "/w ?" must match "xxx" before the following "d" can match, so that the entire expression matches successfully. Therefore, the result is: "/w ?" matches "xxx"

(.*)

aa< ;/p>

bb

< p>aa

bb

(.*?)" will be obtained.

aa

bb

(.*?)" and "
##/uXXXX Any character can be represented by "/u" plus the 4-digit hexadecimal number of its number, such as: "/中" 3.2 While the expressions "/s", "/d", "/w", and "/b" represent special meanings, the corresponding uppercase letters represent the opposite meaning

Form

Character range

##/xXX

Characters numbered in the range of 0 ~ 255, for example: spaces can be represented by "/x20"

3.3 It has special meaning in the expression. You need to add "/" to match the character summary of the character itself

Expression can match /S Matches all non-whitespace characters ("/s" can match each whitespace character) /D Matches all non-numeric characters /W Matches all letters and numbers , characters other than underscore /B match non-word boundaries, that is, both left and right sides are "/w" The character gap when neither the range nor the left and right sides are "/w" range

Character

Description

^

Matches the beginning of the input string. To match the "^" character itself, use "/^"

$

to match the input string end position. To match the "$" character itself, mark a subexpression with "/$"

( )

The starting and ending positions of the formula. To match parentheses, use "/(" and "/)"

[ ]

to use from Define expressions that match 'many characters'. To match square brackets, use "/[" and "/]"

{ }

to modify the match The symbol of times. To match curly braces, use "/{" and "/}"

.

matches except newlines Any character except character (/n). To match the decimal point itself, use "/."

?

to modify the match count to 0 or 1 Second-rate. To match the "?" character itself, use "/?"

to modify the match count to at least 1 Second-rate. To match the " " character itself, use "/ "

*

to modify the match count to 0 times or any number of times. To match the "*" character itself, use the expression between the left and right sides of "/*"

|

"or" relationship between. To match "|" itself, please use "/|"

3.4 Subexpressions within brackets "( )" if you want the matching result to be different To record for later use, you can use the "(?:xxxxx)" format

Example 1: When the expression "(?:(/w)/1)" matches "a bbccdd efg", The result is "bbccdd". Matches within the bracket "(?:)" range are not logged, so "(/w)" is quoted using "/1".

3.5 Introduction to commonly used expression attribute settings: Ignorecase, Singleline, Multiline, Global

Expression Attribute

Description

Ignorecase

By default, in the expression The letters are case-sensitive. Configure Ignorecase to make matching case-insensitive. Some expression engines extend the concept of "case" to upper and lower case in the UNICODE range.

Singleline

By default, the decimal point "." matches characters except newlines (/n) . Configuring as Singleline allows the decimal point to match all characters, including newlines.

Multiline

By default, the expressions "^" and "$" only match the beginning of the string ① and the ending ④ position. like:

①xxxxxxxxx②/n
③xxxxxxxxx④

Configuring it as Multiline can make "^" match not only ①, but also the position ③ after the newline character and before the start of the next line, and make "$" match ④, and it can also match the position ② before the newline character and the end of a line.

Global

Mainly works when expressions are used to replace. If configured as Global, it means replacing all match.

4. Other tips

4.1 If you want to know about the complex regular syntax that the advanced regular engine supports, please refer to the DEELX regular engine on this site Documentation.

4.2 If you want the expression to match the entire string instead of finding a part of the string, you can use "^" and "$" at the beginning and end of the expression, for example: "^ /d $" requires the entire string to be digits only.

4.3 If the matched content is a complete word, not a part of the word, then use "/b" at the beginning and end of the expression, for example: use "/b(if|while|else| void|int……)/b" to match keywords in the program.

4.4 The expression should not match the empty string. Otherwise, you will always get matching success, but nothing will be matched as a result. For example: when preparing to write an expression matching "123", "123.", "123.5", ".5", the integer, decimal point, and decimal digits can be omitted, but do not write the expression as: " /d*/.?/d*", because this expression can also match successfully if there is nothing. A better way to write it is: "/d /.?/d*|/./d ".

4.5 Do not loop infinitely for submatches that can match the empty string. If each part of the subexpression within the brackets can be matched 0 times, and the brackets as a whole can be matched an unlimited number of times, then the situation may be more serious than the previous item, and the matching process may loop endlessly. Although some regular expression engines have adopted methods to avoid this situation, such as .NET regular expressions, we should still try to avoid this situation. If we encounter an infinite loop when writing an expression, we can also start from this point to find out whether it is the reason mentioned in this article.

4.6 Reasonably choose greedy mode and non-greedy mode, see topic discussion.

4.7 Or the left and right sides of "|", it is best to match only one side of a certain character, so that the expressions on both sides of "|" will not be different due to the exchange of positions.

Next----------------------------------- -----------

1, define regular expression

1) definition There are two forms of regular expressions, one is the ordinary way and the other is the constructor way.

2) Ordinary method: var reg=/expression/additional parameters

Expression: a string representing a certain rule, in which certain special characters can be used to represent special The rules will be explained in detail later.

Additional parameters: used to expand the meaning of the expression. Currently there are three main parameters:
g: represents global matching.
i: represents case-insensitive matching.
m: Indicates that multiple lines can be matched.

The above three parameters can be combined in any combination to represent a compound meaning. Of course, no parameters can be added.

Example:
var reg=/a*b/;
var reg=/abc f/g;

3) Constructor method: var reg=new RegExp( "Expression", "Additional parameters");
The meanings of "expression" and "additional parameters" are the same as those in the above definition.
Example:
var reg=new RegExp(“a*b”);
var reg=new RegExp(“abc f”,”g”);

4) Normal way The difference from the constructor method
The expression in the ordinary method must be a constant string, while the expression in the constructor can be a constant string or a js variable, for example, based on user input as an expression Formula parameters, etc.:
var reg=new RegExp(document.forms[0].exprfiled.value,”g”);

2, expression mode

1) Expression mode refers to the expression method and style of expression, that is, how to describe the "expression" in var reg=/expression/additional parameters?
2) Standardly speaking, expression modes are divided into simple modes and compound modes.
3) Simple mode: refers to a pattern expressed through a combination of ordinary characters, such as
var reg=/abc0d/;
It can be seen that simple mode can only represent specific matches.

4) Compound pattern: refers to a pattern expressed by wildcard characters, for example:
var reg=/a b?/w/;
where,? and /w are all wildcard characters, representing has a special meaning. Therefore, composite patterns can express more abstract logic.
Let’s focus on the meaning and use of each wildcard character in the composite mode.

5) Explanation of special characters in compound mode:

1>/: used as an escape character in many programming languages. Generally speaking, if the
/ symbol is followed by is the ordinary character c, then /c represents a special meaning. For example, n originally represents the character n, but /n represents a new line. If the
/ symbol is followed by the special character c, then /c represents the ordinary character c. For example, / is generally used as an escape character, but // represents the ordinary character /.
The usage of / in regular expressions in Javascript is the same as above, except that the special character tables may be different in different programming languages.

2>^:匹配输入字符串的起始端,如果是多行匹配,即表达式的附加参数中含有m,则也在一个换行符后匹配。
例子:
/^B/匹配 “Bab Bc ”中的第一个B

例子2:
/^B/gm匹配
“Badd B
cdaf
B dsfB”

中的第一行第一个B,第三行中的第一个B

3>$:匹配输入字符创的尾端,如果是多行匹配,即表达式的附加参数中含有m,则也在一个换行符前匹配。

与^的用法相反。

例子:/t$/匹配“bat”中的t,但是不匹配“hate”中的t

例子2:/t$/匹配

“tag at
bat”
中第一行的最后一个t和第二行的t。

4>*:匹配前一个字符0次或多次。

例子:/ab*/匹配“dddabbbbc”中的“abbbb”,也匹配“ddda”中的“a”

5>+:匹配前一个字符1次或多次。

例子:/ab+/匹配“dddabbbbc”中的“abbbb”,但不匹配“ddda”

与后面的{1,}(原型:{n,})的用法类似

6>?:?的用法比较特殊,一般来说它用来对前一个字符做0次或1次匹配,但是它有另外两种特殊的用法:

如果紧跟在*、+、?和{ }之后,则表示原始匹配的最小次数匹配,例如:
/ba*/本来匹配“bbbaaaa”中的“baaaa”,但是/ba*?/则匹配“bbbaaaa”中的“b”(因为*表示0次或多次匹配,而加?应该表示最少次数匹配,即0次匹配)。
同理:/ba+?/则匹配“baaaa”中的“ba”。
作为语法结构符号,使用于前置断言中,即后面要说到的x(?=y)和x(?!=y)

7>.:小数点中的“.”号,匹配任何一个单独的字符,但是换行符除外。

标准中总共有哪些字符?请参考:字符集
例如:/a.b/匹配“acbaa”中的“acb”,但是不匹配“abbb”。

8>(x):表示匹配x(并非特指字符x或者特指一个字符,x表示一个字符串),而且匹配会被记住,在语法中这种()被称为“capturing parentheses ”,即捕捉用的小括号。

匹配会被记住,是因为在表达式提供的函数中,有些函数返回一个数组,该数组会保存所匹配的所有字符串,例如exec()函数。
另外还要注意()中的x被记住的前提是匹配x。

例子1:

var regx=/a(b)c/; var rs=regx.exec(“abcddd”);
Copy after login

从上面可以看出,/a(b)c/匹配“abcddd”中的“abc”,因为()的原因,b也会记录下来,因此rs返回的数字内容为:
{abc,b}

例子2:

var regx=/a(b)c/; var rs=regx.exec(“acbcddd”);
Copy after login

rs返回null,因为/a(b)c/不匹配“acbcddd”,所以()中的b不会被记录下来(尽管字符串中含有b)

9>(?:x):匹配x,但不会记住x,这种格式中的()被称为“non-capturing parentheses ”,即非捕捉用的小括号。

例子:

var regx=/a(?:b)c/; var rs=regx.exec(“abcddd”);
Copy after login

从上面可以看出,/a(?:b)c/匹配“abcddd”中的“abc”,因为(?:)的原因,b不会记录下来,因此rs返回的数字内容为:
{abc}

10>X(?=y):匹配x,仅当后面紧跟着y时。如果符合匹配,则只有x会被记住,y不会被记住。

例子:

var regx=/user(?=name)/; var rs=regx.exec(“The username is Mary”);
Copy after login

结果:匹配成功,而且rs的值为{user}

11>X(?!y):匹配x,仅当后面不紧跟着y时。如果符合匹配,则只有x会被记住,y不会被记住。

例子:

var regx=/user(?!name)/; var rs=regx.exec(“The user name is Mary”);
Copy after login

结果:匹配成功,而且rs的值为{user}

例子2:

var regx=//d+(?!/.)/; var rs=regx.exec(“54.235”);
Copy after login

结果:匹配成果,rs的值为{5},不匹配54是因为54后面跟着“.”号,当然235也匹配,但是由于exec方法的行为,235不会被返回

12>x|y:匹配x或y。注意如果x和y都匹配上了,那么只记住x。

例子:

var regx=/beijing|shanghai/; var rs=regx.exec(“I love beijing and shanghai”);
Copy after login

结果:匹配成功,rs的值为{beijing},虽然shanghai也匹配,但不会被记住。

13>{n}:匹配前一个字符的n次出现。
n必须是一个非负数,当然如果是一个负数或小数也不会报语法错误。

例子:

var regx=/ab{2}c/; var rs=regx.exec(“abbcd”);
Copy after login

结果:匹配成功,rs的值为:{abbc}。

14>{n,}:匹配前一个字符的至少n次出现。

例子:

var regx=/ab{2,}c/; var rs=regx.exec(“abbcdabbbc”);
Copy after login

结果:匹配成功,rs的值为:{abbc}。注意为什么abbbc也符合条件为什么没有被记住,这与exec方法的行为有关,后面会统一讲解。

15>{n,m}:匹配前一个字符的至少n次最多m次的出现。
只要n与m为数字,而且m>=n就不会报语法错误。

例子:

var regx=/ab{2,5}c/; var rs=regx.exec(“abbbcd”);
Copy after login

结果:匹配成功,rs的值为:{abbbc}。

例子2:

var regx=/ab{2,2}c/; var rs=regx.exec(“abbcd”);
Copy after login

结果:匹配成功,rs的值为:{abbc}。

例子3:

var regx=/ab(2,5)/; var rs=regx.exec(“abbbbbbbbbb”);
Copy after login

结果:匹配成功,rs的值为:{abbbbb},这说明,如果前一个字符出现多于m次,则只匹配m次。另外:
var regx=/ab(2,5)c/;
var rs=regx.exec(“abbbbbbbbbbc”);
结果:匹配失败,rs的值为:null,为什么匹配失败,因为b多于5个则b(2,5)会匹配前5个b,,而表达式/ab(2,5)c/中b后面是c,但字符串中5个b之后还是b所以会报错。

16>[xyz]:xyz表示一个字符串,该模式表示匹配[]中的一个字符,形式上[xyz]等同于[x-z]。

例子:

var regx=/a[bc]d/; var rs=regx.exec(“abddgg”);
Copy after login

结果:匹配成功,rs的值为:{abd}

例子2:

var regx=/a[bc]d/; var rs=regx.exec(“abcd”);
Copy after login

结果:匹配失败,rs的值为:null,之所以失败,是因为[bc]表示匹配b或c中的一个,但不会同时匹配。

17>[^xyz]:该模式表示匹配非[]中的一个字符,形式上[^xyz]等同于[^x-z]。

例子:

var regx=/a[^bc]d/; var rs=regx.exec(“afddgg”);
Copy after login

结果:匹配成功,rs的值为:{afd}

例子2:

var regx=/a[^bc]d/; var rs=regx.exec(“abd”);
Copy after login

结果:匹配失败,rs的值为:。

18>[/b]:匹配退格键。

19>/b:匹配一个词的边界符,例如空格和换行符等等,当然匹配换行符时,表达式应该附加参数m。

例子:

var regx=//bc./; var rs=regx.exec(“Beijing is a beautiful city”);
Copy after login

结果:匹配成功,rs的值为:{ci},注意c前边的空格不会匹配到结果中,即{ ci}是不正确的。

20>/B:代表一个非单词边界。

例子:

var regx=//Bi./; var rs=regx.exec(“Beijing is a beautiful city”);
Copy after login

结果:匹配成功,rs的值为:{ij},即匹配了Beijing中的ij。

21>/cX,匹配一个控制字符。例如, /cM 匹配一个 Control-M 或
回车符。 x 的值必须为 A-Z 或 a-z 之一。否则,将 c 视为一
个原义的 'c' 字符。(实际的例子还需补充)

21>/d:匹配一个数字字符,等同于[0-9]。

例子:

var regx=/user/d/; var rs=regx.exec(“user1”);
Copy after login

结果:匹配成功,rs的值为:{user1}

22>/D:匹配一个非数字字符,等同于[^0-9]。
例子:

var regx=/user/D/; var rs=regx.exec(“userA”);
Copy after login

结果:匹配成功,rs的值为:{userA}

23>/f:匹配一个换页符。

24>/n:匹配一个换行符。因为是换行符,所以在表达式中要加入m参数。
例子:

var regx=/a/nbc/m; var str=“a bc”; var rs=regx.exec(str);
Copy after login

结果:匹配成功,rs的值为:{ },如果表达式为/a/n/rbc/,则不会被匹配,因此在一般的编辑器中一个”Enter”键代表着“回车换行”,而非“换行回车”,至少在textarea域中是这样的。
25>/r:匹配一个回车符

26>/s:匹配一个空格符,等同于[ /f/n/r/t/v/u00A0/u2028/u2029].

例子:

var regx=//si/; var rs=regx.exec(“Beijing is a city”);
Copy after login

结果:匹配成功,rs的值为:{ i}

27>/S:匹配一个非空格符,等同于[ ^/f/n/r/t/v/u00A0/u2028/u2029].

例子:

var regx=//Si/; var rs=regx.exec(“Beijing is a city”);
Copy after login

结果:匹配成功,rs的值为:{ei}

28>/t:匹配一个tab

例子:

var regx=/a/tb/; var rs=regx.exec(“a bc”);
Copy after login

结果:匹配成功,rs的值为: {a bc}

29>/v:匹配一个竖向的tab

30>/w:匹配一个数字、_或字母表字符,即[A-Za-z0-9_ ]。

例子:

var regx=//w/; var rs=regx.exec(“$25.23”);
Copy after login
Copy after login

结果:匹配成功,rs的值为:{2}

31>/W:匹配一个非数字、_或字母表字符,即[^A-Za-z0-9_ ]。

例子:

var regx=//w/; var rs=regx.exec(“$25.23”);
Copy after login
Copy after login

结果:匹配成功,rs的值为:{$}

32>/n:注意不是/n,这里n是一个正整数,表示匹配第n个()中的字符。

例子:

var regx=/user([,-])group/1role/; var rs=regx.exec(“user-group-role”);
Copy after login

结果:匹配成功,rs的值为:{user-group-role,-},同样对user,group,role的匹配也是成功的,但像user-group,role等就不对了。

33>/0:匹配一个NUL字符。

34>/xhh:匹配一个由两位16进制数字所表达的字符。

35>/uhhhh:匹配一个由四位16进制数字所表达的字符。

3,表达式操作

1)表达式操作,在这里是指和表达式相关的方法,我们将介绍六个方法。
2)表达式对象(RegExp)方法:

1>exec(str),返回str中与表达式相匹配的第一个字符串,而且以数组的形式表现,当然如果表达式中含有捕捉用的小括号,则返回的数组中也可能含有()中的匹配字符串,例如:

var regx=//d+/; var rs=regx.exec(“3432ddf53”);
Copy after login

返回的rs值为:{3432}

var regx2=new RegExp(“ab(/d+)c”); var rs2=regx2.exec(“ab234c44”);
Copy after login

返回的rs值为:{ab234c,234}
另外,如果有多个合适的匹配,则第一次执行exec返回一个第一个匹配,此时继续执行exec,则依次返回第二个第三个匹配。例如:

var regx=/user/d/g; var rs=regx.exec(“ddduser1dsfuser2dd”); var rs1=regx.exec(“ddduser1dsfuser2dd”);
Copy after login

则rs的值为{user1},rs的值为{rs2},当然注意regx中的g参数是必须的,否则无论exec执行多少次,都返回第一个匹配。后面还有相关内容涉及到对此想象的解释。

2>test(str),判断字符串str是否匹配表达式,返回一个布尔值。例如:

var regx=/user/d+/g; var flag=regx.test(“user12dd”);
Copy after login

flag的值为true。

3)String对象方法

1>match(expr),返回与expr相匹配的一个字符串数组,如果没有加参数g,则返回第一个匹配,加入参数g则返回所有的匹配
例子:

var regx=/user/d/g; var str=“user13userddduser345”; var rs=str.match(regx);
Copy after login

rs的值为:{user1,user3}

2>search(expr),返回字符串中与expr相匹配的第一个匹配的index值。
例子:

var regx=/user/d/g; var str=“user13userddduser345”; var rs=str.search(regx);
Copy after login

rs的值为:0

3>replace(expr,str),将字符串中匹配expr的部分替换为str。另外在replace方法中,str中可以含有一种变量符号$,格式为$n,代表匹配中被记住的第n的匹配字符串(注意小括号可以记忆匹配)。
例子:

var regx=/user/d/g; var str=“user13userddduser345”; var rs=str.replace(regx,”00”);
Copy after login

rs的值为:003userddd0045
例子2:

var regx=/u(se)r/d/g; var str=“user13userddduser345”; var rs=str.replace(regx,”$1”);
Copy after login

rs的值为:se3userdddse45
对于replace(expr,str)方法还要特别注意一点,如果expr是一个表达式对象则会进行全局替换(此时表达式必须附加参数g,否则也只是替换第一个匹配),如果expr是一个字符串对象,则只会替换第一个匹配的部分,例如:

var regx=“user” var str=“user13userddduser345”; var rs=str.replace(regx,”00”);
Copy after login

rs的值为: 0013userddduser345

4>split(expr),将字符串以匹配expr的部分做分割,返回一个数组,而且表达式是否附加参数g都没有关系,结果是一样的。
例子:

var regx=/user/d/g; var str=“user13userddduser345”; var rs=str.split(regx);
Copy after login

rs的值为:{3userddd,45}

4,表达式相关属性

1)表达式相关属性,是指和表达式相关的属性,如下面的形式:

var regx=/myexpr/; var rs=regx.exec(str);
Copy after login

其中,和表达式自身regx相关的属性有两个,和表达式匹配结果rs相关的属性有三个,下面将逐一介绍。
2)和表达式自身相关的两个属性:

1>lastIndex,返回开始下一个匹配的位置,注意必须是全局匹配(表达式中带有g参数)时,lastIndex才会有不断返回下一个匹配值,否则该值为总是返回第一个下一个匹配位置,例如:

var regx=/user/d/; var rs=regx.exec(“sdsfuser1dfsfuser2”); var lastIndex1=regx.lastIndex; rs=regx.exec(“sdsfuser1dfsfuser2”); var lastIndex2=regx.lastIndex; rs=regx.exec(“sdsfuser1dfsfuser2”); var lastIndex3=regx.lastIndex;
Copy after login

上面lastIndex1为9,第二个lastIndex2也为9,第三个也是9;如果regx=/user/d/g,则第一个为9,第二个为18,第三个为0。

2>source,返回表达式字符串自身。例如:

var regx=/user/d/; var rs=regx.exec(“sdsfuser1dfsfuser2”); var source=regx.source;
Copy after login

source的值为user/d
3)和匹配结果相关的三个属性:

1>index,返回当前匹配的位置。例如:

var regx=/user/d/; var rs=regx.exec(“sdsfuser1dfsfuser2”); var index1=rs.index; rs=regx.exec(“sdsfuser1dfsfuser2”); var index2=rs.index; rs=regx.exec(“sdsfuser1dfsfuser2”); var index3=rs.index;
Copy after login

index1为4,index2为4,index3为4,如果表达式加入参数g,则index1为4,index2为13,index3会报错(index为空或不是对象)。

2>input,用于匹配的字符串。例如:

var regx=/user/d/; var rs=regx.exec(“sdsfuser1dfsfuser2”); var input=rs.input;
Copy after login

input的值为sdsfuser1dfsfuser2。

3>[0],返回匹配结果中的第一个匹配值,对于match而言可能返回一个多值的数字,则除了[0]外,还可以取[1]、[2]等等。例如:

var regx=/user/d/; var rs=regx.exec(“sdsfuser1dfsfuser2”); var value1=rs[0]; rs=regx.exec(“sdsfuser1dfsfuser2”); var value2=rs[0];
Copy after login

value1的值为user1,value2的值为user2

5,实际应用

1)实际应用一
描述:有一表单,其中有一个“用户名”input域
要求:汉字,而且不能少于2个汉字,不能多于4个汉字。
实现:

 
Copy after login

2)实际应用二

描述:给定一个含有html标记的字符串,要求将其中的html标记去掉。

实现:

 
Copy after login

三,小结

1,Javascript正则表达式,我想在一般的程序员之中,使用者应该不是很多,因为我们处理的页面一般都不是很复杂,而复杂的逻辑一般我们都在后台处理完成了。但是目前趋势已经出现了扭转,富客户端已经被越来越多的人接受,而Javascript就是其中的关键技术,对于复杂的客户端逻辑而言,正则表达式的作用也是很关键的,同时它也是Javascript高手必须要掌握的重要技术之一。

2,为了能够便于大家对前面讲述的内容有一个更为综合和深刻的认识,我将前面的一些关键点和容易犯糊涂的地方再系统总结一下,这部分很关键!

总结1:附件参数g的用法

表达式加上参数g之后,表明可以进行全局匹配,注意这里“可以”的含义。我们详细叙述:

1)对于表达式对象的exec方法,不加入g,则只返回第一个匹配,无论执行多少次均是如此,如果加入g,则第一次执行也返回第一个匹配,再执行返回第二个匹配,依次类推。例如

var regx=/user/d/; var str=“user18dsdfuser2dsfsd”; var rs=regx.exec(str);//此时rs的值为{user1} var rs2=regx.exec(str);//此时rs的值依然为{user1}
Copy after login

如果regx=/user/d/g;则rs的值为{user1},rs2的值为{user2}

通过这个例子说明:对于exec方法,表达式加入了g,并不是说执行exec方法就可以返回所有的匹配,而是说加入了g之后,我可以通过某种方式得到所有的匹配,这里的“方式”对于exec而言,就是依次执行这个方法即可。

2)对于表达式对象的test方法,加入g于不加上g没有什么区别。

3)对于String对象的match方法,不加入g,也只是返回第一个匹配,一直执行match方法也总是返回第一个匹配,加入g,则一次返回所有的匹配(注意这与表达式对象的exec方法不同,对于exec而言,表达式即使加上了g,也不会一次返回所有的匹配)。例如:

var regx=/user/d/; var str=“user1sdfsffuser2dfsdf”; var rs=str.match(regx);//此时rs的值为{user1} var rs2=str.match(regx);//此时rs的值依然为{user1}
Copy after login

如果regx=/user/d/g,则rs的值为{user1,user2},rs2的值也为{user1,user2}

4)对于String对象的replace方法,表达式不加入g,则只替换第一个匹配,如果加入g,则替换所有匹配。(开头的三道测试题能很好的说明这一点)

5)对于String对象的split方法,加上g与不加g是一样的,即:

var sep=/user/d/; var array=“user1dfsfuser2dfsf”.split(sep);
Copy after login

则array的值为{dfsf, dfsf}

此时sep=/user/d/g,返回值是一样的。

6)对于String对象的search方法,加不加g也是一样的。

总结2:附加参数m的用法

附加参数m,表明可以进行多行匹配,但是这个只有当使用^和$模式时才会起作用,在其他的模式中,加不加入m都可以进行多行匹配(其实说多行的字符串也是一个普通字符串),我们举例说明这一点

1)使用^的例子

var regx=/^b./g; var str=“bd76 dfsdf sdfsdfs dffs b76dsf sdfsdf”; var rs=str.match(regx);
Copy after login

此时加入g和不加入g,都只返回第一个匹配{bd},如果regx=/^b./gm,则返回所有的匹配{bd,b7},注意如果regx=/^b./m,则也只返回第一个匹配。所以,加入m表明可以进行多行匹配,加入g表明可以进行全局匹配,综合到一起就是可以进行多行全局匹配

2)使用其他模式的例子,例如

var regx=/user/d/; var str=“sdfsfsdfsdf sdfsuser3 dffs b76dsf user6”; var rs=str.match(regx);
Copy after login

此时不加参数g,则返回{user3},加入参数g返回{user3,user6},加不加入m对此没有影响。

3)因此对于m我们要清楚它的使用,记住它只对^和$模式起作用,在这两种模式中,m的作用为:如果不加入m,则只能在第一行进行匹配,如果加入m则可以在所有的行进行匹配。我们再看一个^的例子

var regx=/^b./; var str=“ret76 dfsdf bjfsdfs dffs b76dsf sdfsdf”; var rs=str.match(regx);
Copy after login

此时rs的值为null,如果加入g,rs的值仍然为null,如果加入m,则rs的值为{bj}(也就是说,在第一行没有找到匹配,因为有参数m,所以可以继续去下面的行去找是否有匹配),如果m和g都加上,则返回{bj,b7}(只加m不加g说明,可以去多行进行匹配,但是找到一个匹配后就返回,加入g表明将多行中所有的匹配返回,当然对于match方法是如此,对于exec呢,则需要执行多次才能依次返回)

总结3:

在HTML的textarea输入域中,按一个Enter键,对应的控制字符为“/r/n”,即“回车换行”,而不是“/n/r”,即“换行回车”,我们看一个前面我们举过的例子:

var regx=/a/r/nbc/; var str=“a bc”; var rs=regx.exec(str);
Copy after login

结果:匹配成功,rs的值为:{ },如果表达式为/a/n/rbc/,则不会被匹配,因此在一般的编辑器中一个”Enter”键代表着“回车换行”,而非“换行回车”,至少在textarea域中是这样的。

以上就是本文的全部内容,希望对大家的学习有所帮助,更多相关内容请关注PHP中文网!

相关推荐:

关于js鼠标按键事件和键盘按键事件的使用方法

基于Vue-cli搭建的项目如何和后台交互的介绍

The above is the detailed content of About the use of RegExp objects and brackets in JS regular expressions. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!