Python - The title of the web page contains a newline. How to extract it using regular expressions?
女神的闺蜜爱上我
女神的闺蜜爱上我 2017-06-22 11:51:43
0
2
902

When using python to do CSDN web crawler, when crawling the title of the web page, I always use the regular expression(?<=\). ?(?=\< )cannot be used in CSDN. Go to the CSDN source code and see that the title breaks into new lines and displays

As a result, the original regular expression cannot be used. Then, the question arises. The title of a webpage like this contains a newline. How to extract it with a regular expression?

PS:

  1. I don’t want to use xpath or beautifulsoup methods, I just need regular expressions

  2. CSDN itself has an anti-crawler mechanism. It’s not because of this anti-crawler that I couldn’t crawl the title

thank you all

Referring to @caimaoy's method, I changed the regular expression to(?<=\)(?:.|\n) ?(?=\<)## After #, the title is extracted perfectly.Thank you all again.

女神的闺蜜爱上我
女神的闺蜜爱上我

reply all (2)
仅有的幸福
  1. re.M Multi-line mode

  2. Write multi-line matching by yourself http://python3-cookbook.readt...

    曾经蜡笔没有小新

    Add aflagto the expression

    tite = '......' print(re.findall('(?<=\).+?(?=\<)', title, re.S))
      Latest Downloads
      More>
      Web Effects
      Website Source Code
      Website Materials
      Front End Template
      About us Disclaimer Sitemap
      php.cn:Public welfare online PHP training,Help PHP learners grow quickly!