python - The webpage cannot be crawled again after updating the data
给我你的怀抱
给我你的怀抱 2017-05-18 10:58:50
0
2
536

The webpage I crawled updated a piece of information today, and then the crawler ran but did not crawl it.

from pyspider.libs.base_handler import * from pyspider.database.mysql.mysqldb import SQL class Handler(BaseHandler): crawl_config = { } @every(minutes=24 * 60) def on_start(self): self.crawl('http://www.yxztb.net/yxweb/zypd/012001/012001001/', callback=self.index_page) @config(age=10 * 24 * 60 * 60) def index_page(self, response): for each in response.doc('.tdmoreinfosub a').items(): self.crawl(each.attr.href, callback=self.detail_page) @config(priority=2) def detail_page(self, response): return { "address":"宜兴市", "url":response.url, "title":response.doc('font span').text(), "date" :response.doc('#tdTitle > .webfont').text()[8:17], } def on_result(self, result): print result if not result or not result['title']: return sql = SQL() sql.replace('zhaobiao',**result)

I hope the bosses can be more specific and have more exchanges

给我你的怀抱
给我你的怀抱

reply all (2)
我想大声告诉你

@config (age) parameter setting directly ignores the execution of index.page

    迷茫

    Since @every of on_start is one day, then setage=12 * 60 * 60半天是比较合适的,保证每次 every 肯定不会被 age 所限制。另外@config(age=10 * 24 * 60 * 60)in self.crawl, which means not to crawl again within 10 days.

      Latest Downloads
      More>
      Web Effects
      Website Source Code
      Website Materials
      Front End Template
      About us Disclaimer Sitemap
      php.cn:Public welfare online PHP training,Help PHP learners grow quickly!