Python crawls the data and gets a list, but how to remove the span tag in it?-PHP Chinese Network Q&A

Article Topic Learning Download Q&A Programming Dictionary Game Recent Updates

简体中文(ZH-CN) English(EN) 繁体中文(ZH-TW) 日本語(JA) 한국어(KO) Melayu(MS) Français(FR) Deutsch(DE)

Python crawls the data and gets a list, but how to remove the span tag in it?

我想大声告诉你 2017-05-18 10:55:53

928

I used p6ython3.6 to crawl down some data, but what was finally displayed was a list containing span tags. When I used get_text, contents, etc., an error would be reported. Why is this?
The initial results returned are as follows:

[2017.5.2] [2017.4.26] [2017.4.24] [2017.4.19] [2017.3.23] [2017.3.17] [2017.2.14] [2017.2.9] [2017.2.6] [2017.2.6]

My code is as follows:

import requests from bs4 import BeautifulSoup import re # def url_list(): # for number in range(1,21): # url_links=[] # url="X".format(i=number) # url_links.append(url) h={"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36"} r=requests.get("url",headers=h) soup=BeautifulSoup(r.text,'lxml') for data in soup.find("p",{"class":"list-main-eventset-finan"}).find_all("li"): content=data.find("i",{"class":"cell date"}).find_all("span") print(time)

我想大声告诉你

reply all (3)

仅有的幸福2017-05-18 10:57:53 3 floor

I don’t remember the API of bs very clearly. There should be a function that can directly obtain the text. It should beget_text()这个函数吧。由于你用的是find_all(). Then I need to traverse the returned result again, that’s it

rs = list() for data in soup.find("p",{"class":"list-main-eventset-finan"}).find_all("li"): contents=data.find("i",{"class":"cell date"}).find_all("span") for content in contents: rs.append(content.get_text())

In addition, you can also use regular expressions to match directly(.*?)<this pattern. But you have to traverse the contens list as above.