Home  >  Article  >  Backend Development  >  Introduction to Python regular expressions and re library (code examples)

Introduction to Python regular expressions and re library (code examples)

不言
不言forward
2019-02-11 10:33:552069browse

This article brings you an introduction (code example) about Python regular expressions and re library. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you. .

A regular expression is a sequence of characters that defines a search pattern. Typically this pattern is used by string search algorithms for "find" or "find and replace" operations on strings, or for input validation.

1. Regular expression syntax

  • . Represents any single character

  • [] Character set, giving a value range for a single character

  • [^] Non-character set, giving an exclusion range for a single character

  • * The previous character is expanded 0 times or infinitely

  • The previous character is expanded 1 time or infinitely

  • ? The previous character is expanded 0 times or 1 Times expansion

  • |Any one of the left and right expressions

  • {m}expands the previous character m times

  • {m,n}Expand the previous character m to n times

  • ^match the beginning of the string

  • $match the end of the string

  • () grouping mark, only the | operator can be used internally

  • d number, equivalent to [0-9]

  • w word characters, equivalent to [A-Z,a-z,0-9]

2. Use of re library in python

Re library is the standard library of python, mainly used for string matching. Calling method: import re

2.1. Type of regular expression string

re library The raw string type is used to represent regular expressions, expressed as
r'text'
raw string is a string that does not contain escape characters again, in short, it is string Characters will be escaped, but raw string will not, because escape symbols will appear in regular expressions, so to avoid tediousness we use raw string

2.2. Re library main function function

  • re.search() Searches for the first position of a regular expression in a string and returns the match object

  • re .match() Matches the regular expression from the beginning of a string and returns the match object

  • re.findall()Search for the string, Return all matching substrings in list type

  • re.split()Split a string according to the regular expression matching result and return list type

  • re.finditer()Search for a string and return an iteration type of matching results. Each iteration element is a match object

  • re.sub()Replace all substrings matching the regular expression in a string and return the replaced string

2.2.1 . re.search(pattern, string, flags=0)

Search for the first position of the regular expression in a string and return the match object

  • ##pattern : The string or native string representation of the regular expression

  • string : The string to be matched

  • flags : When the regular expression is used Control tag

  • re.I re.IGNORECASE Ignore the case of regular expressions, [A‐Z] can match lowercase characters

  • re .M re.MULTILINE The ^ operator in regular expressions can start each line of a given string as a match

  • re.S re.DOTALL The . operation in regular expressions character can match all characters, and the default matches all characters except newlines

Example:

import re
match = re.search(r'[1-9]\d{5}', 'BIT 100081')
if match:
    print(match.group(0))

结果为100081
2.2.2. re.match(pattern, string, flags= 0)

Match the regular expression from the beginning of a string and return the match object

The parameters are the same as the search function
Example:

import re
match = re.match(r'[1-9]\d{5}', 'BIT 100081')
print(match.group(0))

结果会报错,match为空,因为match函数是
从字符串开始位置开始匹配,因为从开始位置没有匹配到,所以为空
2.2.3. re. findall(pattern, string, flags=0)

Search for string and return all matching substrings in list type

The parameters are the same as search
Example:

import re
ls=re.findall(r'[1-9]\d{5}', 'BIT100081 TSU100084')
print(ls)

结果为['100081', '100084']
2.2 .4. re.split(pattern, string, maxsplit=0, flags=0)

Split a string according to the regular expression matching result and return the list type

  • maxsplit: The maximum number of splits, the remaining part is output as the last element

Example:

import re
re.split(r'[1-9]\d{5}', 'BIT100081 TSU100084')
结果['BIT', ' TSU', ' ']
re.split(r'[1-9]\d{5}', 'BIT100081 TSU100084', maxsplit=1)
结果['BIT', ' TSU100081']
2.2.5. re.finditer(pattern, string, maxsplit =0, flags=0)

Search for a string and return an iteration type of matching results. Each iteration element is a match object

The parameters are the same as search
Example:

import re
for m in re.finditer(r'[1-9]\d{5}', 'BIT100081 TSU100084'):
    if m:
        print(m.group(0))
结果为
100081
100084
2.2.6. re.sub(pattern, repl, string, count=0, flags=0)

Replace all substrings matching the regular expression in a string and return the replaced string

  • repl: Replace the string that matches the string

  • count: The maximum number of replacements for the match

Example:

import re
re.sub(r'[1-9]\d{5}', ':zipcode', 'BIT100081 TSU100084')
结果为
'BIT:zipcode TSU:zipcode'
2.3 Another equivalent usage of Re library (object-oriented)

rst=re.search(r'[1-9]\d{5}', 'BIT 100081')
函数式的调用,一次性操作
pat=re.compile(r'[1-9]\d{5}')
rst=pat.search('BIT 100081')
编译后多次操作

regex=re.complie(pattern,flags=0)

regex also has the above Six usages

2.4 Match object of Re library

Match object is the result of a match and contains a lot of matching information

The following is Match Attributes of the object

  • .string Text to be matched

  • .re Patter object used for matching (regular expression Mode)

  • .pos The starting position of the regular expression search text

  • .endpos The end position of the regular expression search text

The following are the methods of the Match object

  • .group(0) Get the matched string

  • .start() Matches the string at the beginning of the original string

  • .end() Matches the string at the end of the original string

  • .span() returns (.start(), .end())

2.5 Greedy matching and minimum matching of Re library

When a regular expression can match multiple items of different lengths, which one is returned? The Re library uses greedy matching by default, that is, it returns the longest matching substring

the smallest matching

  • *? before A character is expanded 0 times or infinitely, and the minimum match is

  • ? The previous character is expanded 1 time or infinitely, and the minimum match is

  • ##?? The previous character is expanded 0 or 1 times, the minimum match is

  • {m,n}? The previous character is expanded m to n times (inclusive), the minimum match is

As long as the length output may be different, you can add ? after the operator to become the minimum match

The above is the detailed content of Introduction to Python regular expressions and re library (code examples). For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:segmentfault.com. If there is any infringement, please contact admin@php.cn delete