Question
When crawling data, usually the debugging information is:
DEBUG: Crawled (200) <GET //m.sbmmt.com/> (referer: None)
If
DEBUG: Crawled (403) <GET //m.sbmmt.com/> (referer: None)
appears, it means that the website uses anti-web-crawling technique ( Used by Amazon), it is relatively simple to check the user agent (User Agent) information.
Solution
Construct a User Agent in the request header, as shown below:
def start_requests(self): yield Request("//m.sbmmt.com/", headers={'User-Agent': "your agent string"})
The above is the detailed content of Python crawler returns 403 error solution. For more information, please follow other related articles on the PHP Chinese website!