Common mistakes for newbies learning regular expressions-Common Problem-php.cn

Common mistakes for newbies learning regular expressions

angryTom

Release： 2019-11-09 17:42:56

forward

3979 people have browsed it

The advantage of regular rules is that they are easy to use. After a few hours of study, you can understand most of the regular rules. Although you can understand them, in the process of practice, you will still encounter many things that you don’t want to know. If you want results, after all, the grammar of regular expressions is still a bit strange. This article has compiled some mistakes that are often made in the learning process of regular expressions.

Common mistakes for newbies learning regular expressions

Regular expression

1. Space

We usually write code At times, spaces are usually used as a tool to make the code more standardized. Together with appropriate indentation and tabs at the beginning of the line, the code looks clearer. But you have to be careful in regular expressions - the space itself is also a character to be matched. If you use spaces inappropriately:

echo preg_match(&#39;/a{1, 3}/&#39;, "aaa") ? &#39;匹配&#39; : &#39;不匹配&#39;;  // 不匹配

Copy after login

For example, the above regular expression is intended to match 1 to 3 a's, but in fact will not match the following three a's, because there is an extra space in the middle of {1, 3}, which invalidates the original meaning of the metacharacter "{}" and becomes an ordinary character:

echo preg_match(&#39;/a{1, 3}/&#39;, "a{1, 3}") ? &#39;匹配&#39; : &#39;不匹配&#39;;  // 匹配

Copy after login

"a {1, 3}" is matched instead, which is obviously not what we want, so be sure to note that unless the space character itself is matched, do not use spaces:

echo preg_match(&#39;/a{1,3}/&#39;, "aaa") ? &#39;匹配&#39; : &#39;不匹配&#39;;  // 匹配

Copy after login

※ Exceptions The pattern modifier It is difficult to understand, and it is not recommended to use:

echo preg_match(&#39;/a a a/x&#39;, "aaa") ? &#39;匹配&#39; : &#39;不匹配&#39;;  // 匹配

Copy after login

2. Capitalization

This is easy to understand, but it is basically a careless mistake. After all, we usually use When searching for letters in a search tool, both uppercase and lowercase letters are usually matched. Sometimes, you forget that regular expressions do not automatically match uppercase and lowercase letters:

echo preg_match(&#39;/flag/&#39;, "Flag") ? &#39;匹配&#39; : &#39;不匹配&#39;;  // 不匹配

Copy after login

There may be cases where the first letter of the string matched like this is capitalized. Naturally, there will be no match. At this time, we must take into account both upper and lower case. But sometimes we want to match a certain word. As long as these four letters are matched together, it is more troublesome to write:

echo preg_match(&#39;/[Ff][Ll][Aa][Gg]/&#39;, "Flag") ? &#39;匹配&#39; : &#39;不匹配&#39;;  // 匹配

Copy after login

Although it is difficult to imagine that there is such a weird thing as "fLaG" Writing method, but if you don’t write it like this, you can’t match all situations, but sometimes we don’t care about case, but the string to be matched is very long. I’m afraid I’ll be exhausted if I write it like this, but fortunately we have the “i” modification. Symbol:

echo preg_match(&#39;/flag/i&#39;, "Flag") ? &#39;匹配&#39; : &#39;不匹配&#39;;  // 匹配

Copy after login

When the modifier "i" is set, the case matching in the pattern will be insensitive.

3. Greedy mode

Quantifiers " " and "*" are greedy mode by default. Beginners may not encounter the problems it brings. I don’t understand what this means. Here’s an example from kano:

preg_match_all(&#39;/<span>.*<\/span>/&#39;, "<span>aaa</span><span>bbb</span>", $matches);
var_dump($matches);

Copy after login

The original intention of the regular expression is to find all the span tags in the string. , and put them into an array, but the result is strange: both spans were matched at once! In fact, this is reasonable if you think about it. The string "aaabbb" does indeed start with and end with , but the .* in it matches too many contents, "aaabbb" are all matched. This is the greedy mode of " " and "*" - by default they will match as many characters as possible, and adding a "?" at the end can cancel this greedy mode, allowing them to match only as little content as possible:

preg_match_all(&#39;/<span>.*?<\/span>/&#39;, "<span>aaa</span><span>bbb</span>", $matches);
var_dump($matches);

Copy after login

Common mistakes for newbies learning regular expressions

This time we got the results we wanted.

The regular syntax is quite special, and it’s easy to get into trouble if you don’t pay attention.

Recommended study: "Quick introduction to regular expressions"

The above is the detailed content of Common mistakes for newbies learning regular expressions. For more information, please follow other related articles on the PHP Chinese website!