python - scrapy captures duplicate content of CNKI response-PHP Chinese Network Q&A

Article Topic Learning Download Q&A Programming Dictionary Game Recent Updates

简体中文(ZH-CN) English(EN) 繁体中文(ZH-TW) 日本語(JA) 한국어(KO) Melayu(MS) Français(FR) Deutsch(DE)

python - scrapy captures duplicate content of CNKI response

黄舟 2017-06-30 09:55:07

829

Traverse the URL requesting page turning

for i in range(3): yield Request("http:xx/page/%s"%str(i),callback=self.parse_page)

The result is that the response request is successful, but the content is the same every time. It is the content of the first request. However, using Postman to request the paginated URLs separately does not have this problem. = = Have you been banned? It was never like this before

黄舟

人生最曼妙的风景，竟是内心的淡定与从容！

reply all (3)

刘奇2017-06-30 09:57:07 3 floor

Then we need to analyze the difference between the header requested when using postman or browser and the header requested when using scrapy

Like+0

Add Reply

三叔2017-06-30 09:57:07 2 floor

Recognized by anti-crawling

Like+0

Add Reply

洪涛2017-06-30 09:57:07 1 floor

Look at the log printed by the console to see if the next page has been crawled correctly
2017-06-29 09:26:13 [scrapy] DEBUG: Scraped from <200 http:xx/page/x>,
Pay attention to whether the last x (http:xx/page/x) has changed

Like+0

Add Reply