python - scrapy crawl 为什么无法跳转到下一个链接?
ringa_lee
ringa_lee 2017-04-17 17:36:45
0
1
798
  1. 代码来自《Learning Scrapy》, 还在学习当中,这段代码基本是抄的,想运行一下,看看效果,无奈没有达到书中所描述的效果, 就是并没有跳转到下一页进行爬虫,帮忙看一下.

  2. 代码:

import datetime import urlparse import socket from scrapy.loader.processors import MapCompose, Join from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule from scrapy.loader import ItemLoader from ..items import PropertiesItem class EasySpider(CrawlSpider): name = 'easy' allowed_domains = ['http://192.168.99.100:32768/'] start_urls = ['http://192.168.99.100:32768/properties/index_00000.html',] # Rules for horizontal and vertical crawling rules = ( Rule(LinkExtractor(restrict_xpaths='//*[contains(@class,"next")]')), Rule(LinkExtractor(restrict_xpaths='//*[@itemprop="url"]'), callback='parse_item') ) def parse_item(self, response): """ This function parses a property page. @url http://192.168.99.100:32768/properties/property_000000.html @returns items 1 @scrapes title price description address image_urls @scrapes url project spider server date """ # Create the loader using the response l = ItemLoader(item=PropertiesItem(), response=response) # Load fields using XPath expressions l.add_xpath('title', '//*[@itemprop="name"][1]/text()', MapCompose(unicode.strip, unicode.title)) l.add_xpath('price', './/*[@itemprop="price"][1]/text()', MapCompose(lambda i: i.replace(',', ''), float), re='[,.0-9]+') l.add_xpath('description', '//*[@itemprop="description"][1]/text()', MapCompose(unicode.strip), Join()) l.add_xpath('address', '//*[@itemtype="http://schema.org/Place"][1]/text()', MapCompose(unicode.strip)) l.add_xpath('image_urls', '//*[@itemprop="image"][1]/@src', MapCompose(lambda i: urlparse.urljoin(response.url, i))) # Housekeeping fields l.add_value('url', response.url) l.add_value('project', self.settings.get('BOT_NAME')) l.add_value('spider', self.name) l.add_value('server', socket.gethostname()) l.add_value('date', datetime.datetime.now()) return l.load_item()
  1. 上面的爬虫地址是docker中的
    网页源代码为:

    Scrapy Book Tutorial Example  
  

Page 4

ringa_lee
ringa_lee

ringa_lee

모든 응답 (1)
洪涛

两个rule交换一下顺序试试

    최신 다운로드
    더>
    웹 효과
    웹사이트 소스 코드
    웹사이트 자료
    프론트엔드 템플릿
    회사 소개 부인 성명 Sitemap
    PHP 중국어 웹사이트:공공복지 온라인 PHP 교육,PHP 학습자의 빠른 성장을 도와주세요!