python crawler Scrapy uses proxy configuration

高洛峰
Release: 2016-10-17 13:56:57
Original
2202 people have browsed it

When crawling website content, the most common problem encountered is: the website has restrictions on IP and has anti-crawling functions. The best way is to rotate IP crawling (adding a proxy)

Let’s talk about Scrapy How to configure the agent and crawl

1. Create a new "middlewares.py" under the Scrapy project

# Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication import base64 # Start your middleware class class ProxyMiddleware(object): # overwrite process request def process_request(self, request, spider): # Set the location of the proxy request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT" # Use the following lines if your proxy requires authentication proxy_user_pass = "USERNAME:PASSWORD" # setup basic authentication for the proxy encoded_user_pass = base64.encodestring(proxy_user_pass) request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
Copy after login

2. Add

DOWNLOADER_MIDDLEWARES = { 'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110, 'pythontab.middlewares.ProxyMiddleware': 100, }
Copy after login

to the project configuration file (./pythontab/settings.py)


source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!