Community Learn Tools Library Leisure

English

Home > Backend Development > Python Tutorial > body text

python抓取京东商城手机列表url实例代码

WBOY

Release： 2016-06-16 08:45:50

Original

1375 people have browsed it

复制代码代码如下:

#-*- coding: UTF-8 -*-
'''
Created on 2013-12-5

@author: good-temper
'''

import urllib2
import bs4
import time

def getPage(urlStr):
    '''
                获取页面内容
    '''
    content = urllib2.urlopen(urlStr).read()
    return content

def getNextPageUrl(currPageNum):
    #http://list.jd.com/9987-653-655-0-0-0-0-0-0-0-1-1-页码-1-1-72-4137-33.html
    url = u'http://list.jd.com/9987-653-655-0-0-0-0-0-0-0-1-1-'+str(currPageNum+1)+'-1-1-72-4137-33.html'

    #是否有下一页
    content = getPage(url);
    soup = bs4.BeautifulSoup(content)
    list = soup.findAll('span',{'class':'next-disabled'});
    if(len(list) == 0):
        return url
    return ''

def analyzeList():
    pageNum = 0
    list = []
    url = getNextPageUrl(pageNum)
    while url !='':
        soup = bs4.BeautifulSoup(getPage(url))
        pagelist = soup.findAll('div',{'class':'p-name'})
        for elem in pagelist:
            soup1 = bs4.BeautifulSoup(str(elem))
            list.append(soup1.find('a')['href'])

        pageNum = pageNum+1
        print pageNum
        url = getNextPageUrl(pageNum)
    return list

def analyzeContent(url):

return ''

def writeToFile(list, path):
    f = open(path, 'a')
    for elem in list:
        f.write(elem+'\n')
    f.close()

if __name__ == '__main__':
    list = analyzeList()
    print '共抓取'+str(len(list))+'条\n'

    writeToFile(list, u'E:\\jd_phone_list.dat');

Related labels：

python

source：php.cn

Previous article：win7安装python生成随机数代码分享 Next article：Python抓取Discuz!用户名脚本代码

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

What is a NullPointerException, and how do I fix it?

2024-10-22 09:46:29
From Novice to Coder: Your Journey Begins with C Fundamentals

2024-10-13 13:53:41
Unlocking Web Development with PHP: A Beginner's Guide

2024-10-12 12:15:51
Demystifying C: A Clear and Simple Path for New Programmers

2024-10-11 22:47:31
Unlock Your Coding Potential: C Programming for Absolute Beginners

2024-10-11 19:36:51
Unleash Your Inner Programmer: C for Absolute Beginners

2024-10-11 15:50:41
Automate Your Life with C: Scripts and Tools for Beginners

2024-10-11 15:07:41
PHP Made Easy: Your First Steps in Web Development

2024-10-11 14:21:21
Build Anything with Python: A Beginner's Guide to Unleashing Your Creativity

2024-10-11 12:59:11
The Key to Coding: Unlocking the Power of Python for Beginners

2024-10-11 12:17:31

Latest Issues

Python/MySQL cannot persist integer data correctly No code is required here. I want to save a very long number because I'm making a game and ...

From 2024-04-04 19:09:44

0

1

367

Using selenium want to click and define URL in class I need another tip today. I'm trying to build Python/Selenium code and the idea is to clic...

From 2024-04-04 14:14:44

0

1

3492

Selenium + Python - inspect image via execute_script I need to verify that an image is displayed on the page using selenium in python. For exam...

From 2024-04-03 09:32:15

0

1

375

How to keep the first X rows and delete table rows I have a big table with millions of records in MySQLincident_archive, I want to sort the r...

From 2024-04-01 18:32:54

0

1

347

How to scrape specific Google Weather text using BeautifulSoup? How to find the course text "New York City, USA" in Python using BeautifulSoup? ...

From 2024-04-01 14:06:14

0

1

308

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template