Python - The title of the web page contains a newline. How to extract it using regular expressions?
女神的闺蜜爱上我
女神的闺蜜爱上我 2017-06-22 11:51:43
0
2
822

When using python to do CSDN web crawler, when crawling the title of the web page, I always use the regular expression (?<=\<title\>). ?(?=\< ) cannot be used in CSDN. Go to the CSDN source code and see that the title breaks into new lines and displays

As a result, the original regular expression cannot be used. Then, the question arises. The title of a webpage like this contains a newline. How to extract it with a regular expression?

PS:

  1. I don’t want to use xpath or beautifulsoup methods, I just need regular expressions

  2. CSDN itself has an anti-crawler mechanism. It’s not because of this anti-crawler that I couldn’t crawl the title

thank you all

Referring to @caimaoy's method, I changed the regular expression to (?<=\<title\>)(?:.|\n) ?(?=\<)## After #, the title is extracted perfectly. Thank you all again.

女神的闺蜜爱上我
女神的闺蜜爱上我

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!