Python - The title of the web page contains a newline. How to extract it using regular expressions?-PHP Chinese Network Q&A

Article Topic Learning Download Q&A Programming Dictionary Game Recent Updates

简体中文(ZH-CN) English(EN) 繁体中文(ZH-TW) 日本語(JA) 한국어(KO) Melayu(MS) Français(FR) Deutsch(DE)

Python - The title of the web page contains a newline. How to extract it using regular expressions?

女神的闺蜜爱上我

女神的闺蜜爱上我 2017-06-22 11:51:43

0

2

902

When using python to do CSDN web crawler, when crawling the title of the web page, I always use the regular expression(?<=\). ?(?=\< )cannot be used in CSDN. Go to the CSDN source code and see that the title breaks into new lines and displays

As a result, the original regular expression cannot be used. Then, the question arises. The title of a webpage like this contains a newline. How to extract it with a regular expression?

PS:

I don’t want to use xpath or beautifulsoup methods, I just need regular expressions
CSDN itself has an anti-crawler mechanism. It’s not because of this anti-crawler that I couldn’t crawl the title

thank you all

Referring to @caimaoy's method, I changed the regular expression to(?<=\)(?:.|\n) ?(?=\<)## After #, the title is extracted perfectly.Thank you all again.

女神的闺蜜爱上我

女神的闺蜜爱上我

reply all (2)

仅有的幸福

仅有的幸福2017-06-22 11:53:43 2 floor

re.M Multi-line mode
Write multi-line matching by yourself http://python3-cookbook.readt...

Like+0

Add Reply

曾经蜡笔没有小新

曾经蜡笔没有小新2017-06-22 11:53:43 1 floor

Add aflagto the expression

tite = '......' print(re.findall('(?<=\).+?(?=\<)', title, re.S))

Like+0

Add Reply

Popular Topics

More>

Popular Articles

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template

About us Disclaimer Sitemap: php.cn：Public welfare online PHP training，Help PHP learners grow quickly！