python - 如何解決scarpy-redis空跑問題?
巴扎黑
巴扎黑 2017-07-04 13:44:26
0
1
1925

scrapy-redis框架中,reids儲存的xxx:requests已經爬取完畢,但程式仍然一直運行,如何自動停止程序,而不是一直在空跑?

2017-07-03 09:17:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-07-03 09:18:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

可以透過engine.close_spider(spider, 'reason')來停止程式的運作。

def next_request(self): block_pop_timeout = self.idle_before_close request = self.queue.pop(block_pop_timeout) if request and self.stats: self.stats.inc_value('scheduler/dequeued/redis', spider=self.spider) if request is None: self.spider.crawler.engine.close_spider(self.spider, 'queue is empty') return request

還有一個問題不明白:
當透過engine.close_spider(spider, 'reason')來關閉spider時,會出現幾個錯誤之後才能關閉。

# 正常关闭 2017-07-03 18:02:38 [scrapy.core.engine] INFO: Closing spider (queue is empty) 2017-07-03 18:02:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'finish_reason': 'queue is empty', 'finish_time': datetime.datetime(2017, 7, 3, 10, 2, 38, 616021), 'log_count/INFO': 8, 'start_time': datetime.datetime(2017, 7, 3, 10, 2, 38, 600382)} 2017-07-03 18:02:38 [scrapy.core.engine] INFO: Spider closed (queue is empty) # 之后还会出现几个错误才关闭spider,难道spider刚启动时会启动多个线程一起抓取, # 然后其中一个线程关闭了spider,其他线程就找不到spider才会报错! Unhandled Error Traceback (most recent call last): File "D:/papp/project/launch.py", line 37, in  process.start() File "D:\Program Files\python3\lib\site-packages\scrapy\crawler.py", line 285, in start reactor.run(installSignalHandlers=False) # blocking call File "D:\Program Files\python3\lib\site-packages\twisted\internet\base.py", line 1243, in run self.mainLoop() File "D:\Program Files\python3\lib\site-packages\twisted\internet\base.py", line 1252, in mainLoop self.runUntilCurrent() ---  --- File "D:\Program Files\python3\lib\site-packages\twisted\internet\base.py", line 878, in runUntilCurrent call.func(*call.args, **call.kw) File "D:\Program Files\python3\lib\site-packages\scrapy\utils\reactor.py", line 41, in __call__ return self._func(*self._a, **self._kw) File "D:\Program Files\python3\lib\site-packages\scrapy\core\engine.py", line 137, in _next_request if self.spider_is_idle(spider) and slot.close_if_idle: File "D:\Program Files\python3\lib\site-packages\scrapy\core\engine.py", line 189, in spider_is_idle if self.slot.start_requests is not None: builtins.AttributeError: 'NoneType' object has no attribute 'start_requests'
巴扎黑
巴扎黑

全部回覆 (1)
我想大声告诉你

怎麼知道放的requests爬取完畢,這個要定義才知道
如果不複雜,可以用內部擴充關掉!

scrapy.contrib.closespider.CloseSpider

CLOSESPIDER_TIMEOUT
CLOSESPIDER_ITEMCOUNT
CLOSESPIDER_PAGECOUNT
CLOSESPIDER_ERRORCOUNT
http://scrapy-chs.readthedocs...

    最新下載
    更多>
    網站特效
    網站源碼
    網站素材
    前端模板
    關於我們 免責聲明 Sitemap
    PHP中文網:公益線上PHP培訓,幫助PHP學習者快速成長!