python - scrapy captures duplicate content of CNKI response
黄舟
黄舟 2017-06-30 09:55:07
0
3
829

Traverse the URL requesting page turning

for i in range(3): yield Request("http:xx/page/%s"%str(i),callback=self.parse_page)

The result is that the response request is successful, but the content is the same every time. It is the content of the first request. However, using Postman to request the paginated URLs separately does not have this problem. = = Have you been banned? It was never like this before

黄舟
黄舟

人生最曼妙的风景,竟是内心的淡定与从容!

reply all (3)
刘奇

Then we need to analyze the difference between the header requested when using postman or browser and the header requested when using scrapy

    三叔

    Recognized by anti-crawling

      洪涛

      Look at the log printed by the console to see if the next page has been crawled correctly
      2017-06-29 09:26:13 [scrapy] DEBUG: Scraped from <200 http:xx/page/x>,
      Pay attention to whether the last x (http:xx/page/x) has changed

        Latest Downloads
        More>
        Web Effects
        Website Source Code
        Website Materials
        Front End Template
        About us Disclaimer Sitemap
        php.cn:Public welfare online PHP training,Help PHP learners grow quickly!