Example tutorial on using BeautifulSoup to grab div tags in python-Python Tutorial-php.cn

Example tutorial on using BeautifulSoup to grab div tags in python

零下一度

Release： 2017-06-01 10:23:21

Original

1822 people have browsed it

This article mainly introduces the method of using BeautifulSoup to capture p tags in python 3. The article gives detailed sample codes for everyone to refer to and learn. It has certain reference and learning value for everyone. Friends who need it can read it together. Take a look.

Preface

This article mainly introduces examples of how to use BeautifulSoup to capture p tags in python 3. I will share them for your reference and study. Let’s take a look at the detailed introduction:

Sample code:

# -*- coding:utf-8 -*-
#python 2.7
#XiaoDeng
#http://tieba.baidu.com/p/2460150866
#标签操作


from bs4 import BeautifulSoup
import urllib.request
import re


#如果是网址，可以用这个办法来读取网页
#html_doc = "http://tieba.baidu.com/p/2460150866"
#req = urllib.request.Request(html_doc) 
#webpage = urllib.request.urlopen(req) 
#html = webpage.read()


html="""
<html><head><title>The Dormouse&#39;s story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse&#39;s story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" rel="external nofollow" class="sister" id="xiaodeng"><!-- Elsie --></a>,
<a href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" rel="external nofollow" class="sister" id="link3">Tillie</a>;
<a href="http://example.com/lacie" rel="external nofollow" rel="external nofollow" class="sister" id="xiaodeng">Lacie</a>
and they lived at the bottom of a well.</p>
<p class="ntopbar_loading"><img src="http://simg.sinajs.cn/blog7style/images/common/loading.gif">加载中…</p>

<p class="SG_connHead">
   <span class="title" comp_title="个人资料">个人资料</span>
   <span class="edit">
      </span>
<p class="info_list">  
         <ul class="info_list1">
     <li><span class="SG_txtc">博客等级：</span><span id="comp_901_grade"><img src="http://simg.sinajs.cn/blog7style/images/common/sg_trans.gif" real_src="http://simg.sinajs.cn/blog7style/images/common/number/9.gif" /></span></li>
     <li><span class="SG_txtc">博客积分：</span><span id="comp_901_score"><strong>0</strong></span></li>
     </ul>
     <ul class="info_list2">
     <li><span class="SG_txtc">博客访问：</span><span id="comp_901_pv"><strong>3,971</strong></span></li>
     <li><span class="SG_txtc">关注人气：</span><span id="comp_901_attention"><strong>0</strong></span></li>
     <li><span class="SG_txtc">获赠金笔：</span><strong id="comp_901_d_goldpen">0支</strong></li>
     <li><span class="SG_txtc">赠出金笔：</span><strong id="comp_901_r_goldpen">0支</strong></li>
     <li class="lisp" id="comp_901_badge"><span class="SG_txtc">荣誉徽章：</span></li>
     </ul>
     </p>
<p class="atcTit_more"><span class="SG_more"><a href="http://blog.sina.com.cn/" rel="external nofollow" rel="external nofollow" target="_blank">更多>></a></span></p>     
<p class="story">...</p>
"""
soup = BeautifulSoup(html, &#39;html.parser&#39;) #文档对象



# 类名为xxx而且文本内容为hahaha的p
for k in soup.find_all(&#39;p&#39;,class_=&#39;atcTit_more&#39;):#,string=&#39;更多&#39;
 print(k)
 #<p class="atcTit_more"><span class="SG_more"><a href="http://blog.sina.com.cn/" rel="external nofollow" rel="external nofollow" target="_blank">更多>></a></span></p>

Copy after login

[Related recommendations]

1. python uses beautifulSoup to implement crawlers

2. Use Python to implement asynchronous proxy crawlers and proxy pool methods

3. Detailed explanation of Python crawlers using proxy proxy How to crawl web pages

The above is the detailed content of Example tutorial on using BeautifulSoup to grab div tags in python. For more information, please follow other related articles on the PHP Chinese website!