84669 person learning
152542 person learning
20005 person learning
5487 person learning
7821 person learning
359900 person learning
3350 person learning
180660 person learning
48569 person learning
18603 person learning
40936 person learning
1549 person learning
1183 person learning
32909 person learning
光阴似箭催人老,日月如移越少年。
<p class="l_post l_post_bright j_l_post clearfix " data-field='{"author":{"user_id":348570172, "user_name":"\u6446\u6446\u821e\u66f2","props":null},"content":{"post_id":31489927386,"is_anonym":false,"forum_id":874949,"thread_id":2108034524,"content":"912904081@qq.com\u8c22\u8c22\u6492","post_no":94,"type":"0","comment_num":0,"props":null,"post_index":0,"pb_tpoint":null}}'> <p class="d_author"> <ul class="p_author"> ... </p>
要爬取的是这个p最外层的标签里user_name和content,中间还有好多好多标签,就是把这个p里的都爬下来了,想知道怎么就留最外面我需要的这个
r = requests.get("http://tieba.baidu.com/p/2108034524?pn=4") soup = BeautifulSoup(r.content, "lxml") users = soup.find_all("p", class_="l_post") for user in users: print(user["data-field"]) # 其他处理
然后对取出的内容再进行处理
要爬取的是这个p最外层的标签里user_name和content,中间还有好多好多标签,就是把这个p里的都爬下来了,想知道怎么就留最外面我需要的这个
然后对取出的内容再进行处理