


How to use regular expressions with the re module in Python?
Regular expressions are implemented in Python through the re module for searching, matching and manipulating strings. 1. Use re.search() to find the first match in the entire string, and re.match() only matches at the beginning of the string; 2. Use brackets () to capture the matching subgroups, which can be named to improve readability; 3. re.findall() returns all non-overlapping matches, and re.finditer() returns the iterator of the matching object; 4. re.sub() replaces the matching text and supports dynamic function replacement; 5. Common patterns include \d, \w, \s, etc., and you can use re.IGNORECASE, re.MULTILINE, re.DOTALL, re.VERBOSE and other flags to adjust your behavior; 6. Frequently used patterns should be compiled through re.compile() to improve performance. It is recommended to use raw strings for all operations to avoid escape problems, and ultimately achieve efficient and flexible string processing.
Regular expressions (regex) in Python are handled through the built-in re
module. They allow you to search, match, and manipulate strings based on patterns. Here's how to use them effectively.
Basic Pattern Matching with re.search() and re.match()
The most common functions are re.search()
and re.match()
.
-
re.search(pattern, string)
scans through the entire string and returns the first match. -
re.match(pattern, string)
checks only at the beginning of the string.
import re text = "Contact us at support@example.com" # Search for an email pattern anywhere in the string match = re.search(r'\b[A-Za-z0-9._% -] @[A-Za-z0-9.-] \.[AZ|az]{2,}\b', text) If match: print("Found:", match.group()) # Output: support@example.com
Note: Always use raw strings ( r""
) with regex to avoid issues with backslashes.
Extracting Data with Groups
You can capture parts of a match using parentsheses ()
.
log_line = "192.168.1.1 - - [23/Jan/2023:10:00:00] \"GET /index.html\" 200" pattern = r'(\d \.\d \.\d \.\d ).*?\[(.*?)\].*?"(GET|POST) (.*?)"' match = re.search(pattern, log_line) If match: ip = match.group(1) # 192.168.1.1 date = match.group(2) # 23/Jan/2023:10:00:00 method = match.group(3) # GET path = match.group(4) # /index.html
You can also name groups for clarity:
pattern = r'(?P<ip>\d \.\d \.\d \.\d ).*?\[(?P<date>.*?)\].*?"(?P<method>GET|POST) (?P<path>.*?)"' match = re.search(pattern, log_line) If match: print(match.group('ip')) # 192.168.1.1 print(match.group('method')) # GET
Finding All Matches with re.findall() and re.finditer()
-
re.findall()
returns a list of all non-overlapping matches. -
re.finditer()
returns an iterator of match objects (more memory efficient for large texts).
text = "Emails: alice@test.com, bob@domain.org, charlie@test.com" # Find all emails emails = re.findall(r'\b[A-Za-z0-9._% -] @[A-Za-z0-9.-] \.[A-Za-z]{2,}\b', text) print(emails) # ['alice@test.com', 'bob@domain.org', 'charlie@test.com'] # Using finder to get positions for match in re.finditer(r'test\.com', text): print(f"Found at {match.start()}-{match.end()}")
Replacing Text with re.sub()
Use re.sub()
to replace patterns in a string.
text = "Call me at 555-123-4567 or 555-987-6543" # Replace all phone numbers with [PHONE] cleaned = re.sub(r'\d{3}-\d{3}-\d{4}', '[PHONE]', text) print(cleaned) # Call me at [PHONE] or [PHONE] # You can also use a function for dynamic replacement def redact(match): return '[REDACTED-' match.group()[0:3] ']' redacted = re.sub(r'\d{3}-\d{3}-\d{4}', redact, text) print(redacted) # Call me at [REDACTED-555] or [REDACTED-555]
Common Regex Patterns and Flags
Some frequently used patterns:
-
\d
— digit -
\w
— word character (letters, digits, underscore) -
\s
— whitespace -
.
— any character except newline -
*
— zero or more -
?
— zero or one -
^
— start of string -
$
— end of string
Use flags to modify behavior:
# Case-insensitive search re.search(r'hello', 'Hello World', re.IGNORECASE) # Match across multiple lines (^ and $ match line boundaries) re.search(r'^start', 'start\nstart', re.MULTILINE) # Dot matches newline re.search(r'.*', 'line1\nline2', re.DOTALL) # Verbose mode for readable patterns pattern = re.compile(r''' \b[A-Za-z0-9._% -] @ # Username [A-Za-z0-9.-] \.[A-Za-z]{2,}\b # Domain ''', re.VERBOSE)
Compiling Patterns for Reuse
If you're using the same pattern multiple times, compile it for better performance.
phone_pattern = re.compile(r'\d{3}-\d{3}-\d{4}') text1 = "Call 123-456-7890" text2 = "Or try 999-888-7777" print(phone_pattern.findall(text1)) # ['123-456-7890'] print(phone_pattern.findall(text2)) # ['999-888-7777']
The compiled object has all the same methods: .search()
, .match()
, .findall()
, .sub()
, etc.
Basically, the re
module gives you powerful tools for string pattern matching. Start with simple patterns, use raw strings, and leverage compiling and flags as your needs grow. It's not magic, but it's extremely useful once you get the basics down.
The above is the detailed content of How to use regular expressions with the re module in Python?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

ClassmethodsinPythonareboundtotheclassandnottoinstances,allowingthemtobecalledwithoutcreatinganobject.1.Theyaredefinedusingthe@classmethoddecoratorandtakeclsasthefirstparameter,referringtotheclassitself.2.Theycanaccessclassvariablesandarecommonlyused

asyncio.Queue is a queue tool for secure communication between asynchronous tasks. 1. The producer adds data through awaitqueue.put(item), and the consumer uses awaitqueue.get() to obtain data; 2. For each item you process, you need to call queue.task_done() to wait for queue.join() to complete all tasks; 3. Use None as the end signal to notify the consumer to stop; 4. When multiple consumers, multiple end signals need to be sent or all tasks have been processed before canceling the task; 5. The queue supports setting maxsize limit capacity, put and get operations automatically suspend and do not block the event loop, and the program finally passes Canc

ToseePythonoutputinaseparatepanelinSublimeText,usethebuilt-inbuildsystembysavingyourfilewitha.pyextensionandpressingCtrl B(orCmd B).2.EnsurethecorrectbuildsystemisselectedbygoingtoTools→BuildSystem→Pythonandconfirming"Python"ischecked.3.Ifn

Regular expressions are implemented in Python through the re module for searching, matching and manipulating strings. 1. Use re.search() to find the first match in the entire string, re.match() only matches at the beginning of the string; 2. Use brackets() to capture the matching subgroups, which can be named to improve readability; 3. re.findall() returns all non-overlapping matches, and re.finditer() returns the iterator of the matching object; 4. re.sub() replaces the matching text and supports dynamic function replacement; 5. Common patterns include \d, \w, \s, etc., you can use re.IGNORECASE, re.MULTILINE, re.DOTALL, re

The Pattern class is used to compile regular expressions, and the Matcher class is used to perform matching operations on strings. The combination of the two can realize text search, matching and replacement; first create a pattern object through Pattern.compile(), and then call its matcher() method to generate a Matcher instance. Then use matches() to judge the full string matching, find() to find subsequences, replaceAll() or replaceFirst() for replacement. If the regular contains a capture group, the nth group content can be obtained through group(n). In actual applications, you should avoid repeated compilation patterns, pay attention to special character escapes, and use the matching pattern flag as needed, and ultimately achieve efficient

EnsurePythonisinstalledbyrunningpython--versionorpython3--versionintheterminal;ifnotinstalled,downloadfrompython.organdaddtoPATH.2.InSublimeText,gotoTools>BuildSystem>NewBuildSystem,replacecontentwith{"cmd":["python","-

VariablesinPythonarecreatedbyassigningavalueusingthe=operator,anddatatypessuchasint,float,str,bool,andNoneTypedefinethekindofdatabeingstored,withPythonbeingdynamicallytypedsotypecheckingoccursatruntimeusingtype(),andwhilevariablescanbereassignedtodif

Usesys.argvforsimpleargumentaccess,whereargumentsaremanuallyhandledandnoautomaticvalidationorhelpisprovided.2.Useargparseforrobustinterfaces,asitsupportsautomatichelp,typechecking,optionalarguments,anddefaultvalues.3.argparseisrecommendedforcomplexsc
