First introduction to Python regular expressions

Regular expression is a special character sequence used to determine whether a string matches the character sequence we set, that is, to check whether a string matches a certain pattern.

Python has added the re module since version 1.5, which provides Perl-style regular expression patterns. The re module brings full regular expression functionality to the Python language.

The following is a step-by-step introduction to regular expressions through examples.

For example, to find whether a string contains a certain character or certain characters, we usually use built-in functions to achieve this, as follows:

# 设定一个常量
a = '两点水|twowater|liangdianshui|草根程序员|ReadingWithU'
# 判断是否有 “两点水” 这个字符串，使用 PY 自带函数
print('是否含有“两点水”这个字符串：{0}'.format(a.index('两点水') > -1))
print('是否含有“两点水”这个字符串：{0}'.format('两点水' in a))

The output results are as follows:

是否含有“两点水”这个字符串：True
是否含有“两点水”这个字符串：True

So, what if you use regular expressions?

As mentioned just now, Python provides us with the re module to realize all the functions of regular expressions, so we first use one of the functions:

re.findall(pattern, string[, flags])

This function implements the function of string Find all the substrings matched by the regular expression and form a list to return. The specific operation is as follows:

import re
# 设定一个常量
a = '两点水|twowater|liangdianshui|草根程序员|ReadingWithU'
# 正则表达式
findall = re.findall('两点水', a)
print(findall)
if len(findall) > 0:
    print('a 含有“两点水”这个字符串')
else:
    print('a 不含有“两点水”这个字符串')

Output results:

['两点水']
a 含有“两点水”这个字符串

As can be seen from the output results, the and The built-in functions have the same function, but it should be emphasized here that the above example is only to facilitate our understanding of regular expressions. The way this regular expression is written is meaningless. Why do you say this way?

Because the problem can be solved using Python's own functions, we don't need to use regular expressions. Doing so is unnecessary. Moreover, the regular expression setting in the above example becomes a constant, not a regular expression rule. The soul of regular expressions lies in rules, so this does not make much sense.

So how to write the rules of regular expressions? Don't worry, let's do it step by step. Let's start with a simple one, find all the lowercase letters in the string. First, we write the regular expression rules in the first parameter of the findall function, where [a-z] matches any lowercase letters, and the second parameter only needs to fill in the string to be matched. The details are as follows:

import re
# 设定一个常量
a = '两点水|twowater|liangdianshui|草根程序员|ReadingWithU'
# 选择 a 里面的所有小写英文字母
re_findall = re.findall('[a-z]', a)
print(re_findall)

The output result:

['t', 'w', 'o', 'w', 'a', 't', 'e', 'r', 'l', 'i', 'a', 'n', 'g', 'd', 'i', 'a', 'n', 's', 'h', 'u', 'i', 'e', 'a', 'd', 'i', 'n', 'g', 'i', 't', 'h']

In this way, we get all the lowercase letters in the string.

Continuing Learning