Community Learn Tools Library Leisure

English

Home > Backend Development > Python Tutorial > [Python] Web Crawler (6): A simple crawler for Baidu Tieba

[Python] Web Crawler (6): A simple crawler for Baidu Tieba

黄舟

Release： 2017-01-21 14:07:39

Original

1514 people have browsed it

[Python] Web Crawler (6): A simple little crawler from Baidu Tieba

# -*- coding: utf-8 -*-  
#---------------------------------------  
#   程序：百度贴吧爬虫  
#   版本：0.1  
#   作者：why  
#   日期：2013-05-14  
#   语言：Python 2.7  
#   操作：输入带分页的地址，去掉最后面的数字，设置一下起始页数和终点页数。  
#   功能：下载对应页码内的所有页面并存储为html文件。  
#---------------------------------------  
   
import string, urllib2  
   
#定义百度函数  
def baidu_tieba(url,begin_page,end_page):     
    for i in range(begin_page, end_page+1):  
        sName = string.zfill(i,5) + &#39;.html&#39;#自动填充成六位的文件名  
        print &#39;正在下载第&#39; + str(i) + &#39;个网页，并将其存储为&#39; + sName + &#39;......&#39;  
        f = open(sName,&#39;w+&#39;)  
        m = urllib2.urlopen(url + str(i)).read()  
        f.write(m)  
        f.close()  
   
   
#-------- 在这里输入参数 ------------------  
  
# 这个是山东大学的百度贴吧中某一个帖子的地址  
#bdurl = &#39;http://tieba.baidu.com/p/2296017831?pn=&#39;  
#iPostBegin = 1  
#iPostEnd = 10  
  
bdurl = str(raw_input(u&#39;请输入贴吧的地址，去掉pn=后面的数字：\n&#39;))  
begin_page = int(raw_input(u&#39;请输入开始的页数：\n&#39;))  
end_page = int(raw_input(u&#39;请输入终点的页数：\n&#39;))  
#-------- 在这里输入参数 ------------------  
   
  
#调用  
baidu_tieba(bdurl,begin_page,end_page)

Copy after login

The above is [Python] Web Crawler (6): The content of a simple crawler on Baidu Tieba. For more related content, please pay attention to the PHP Chinese website (m.sbmmt.com)!

Related labels：

Python，网络爬虫，百度贴吧

source：php.cn

Previous article：[Python] Web Crawler (5): Usage details of urllib2 and website crawling techniques Next article：Python Django uses forms to implement comment functionality

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

Video material on building your own PHP framework from scratch

2023-03-15 16:54:01
Example analysis of how PHPMailer uses QQ mailbox to complete the email sending function

2023-03-15 12:26:02
Introduction to how to receive emails in IMAP in php

2023-03-14 18:58:01
Example of how to quickly implement array deduplication in PHP

2023-03-14 11:30:01
Summary of the use of all attributes of the tag in html

1970-01-01 08:00:00
Summary of basic knowledge of PHP (necessary for beginners to get started)

2023-03-16 15:20:01
Introduction to the use of typeof in JavaScript

1970-01-01 08:00:00
Introduction to the use of confirm() method in JavaScript

1970-01-01 08:00:00
A detailed introduction to the HTML5 Placeholder attribute

1970-01-01 08:00:00
How to implement single-select, multiple-select and reverse-select in forms in ReactJS

1970-01-01 08:00:00

Latest Issues

function_exists() cannot determine the custom function Function test () {return true;} if (function_exists ('test')) {echo "test is function...

From 2024-04-29 11:01:01

0

3

2221

How to display the mobile version of Google Chrome Hello teacher, how can I change Google Chrome into a mobile version?

From 2024-04-23 00:22:19

0

11

2363

The child window operates the parent window, but the output does not respond. The first two sentences are executable, but the last sentence cannot be implemented.

From 2024-04-19 15:37:47

0

1

1976

There is no output in the parent window document.onclick = function(){ window.opener.document.write('I am the output of the child ...

From 2024-04-18 23:52:34

0

1

1862

Where is the courseware about CSS mind mapping? Courseware

From 2024-04-16 10:10:18

0

0

1932

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template