Detailed explanation of how Python crawlers use proxy to crawl web pages-Python Tutorial-php.cn

Detailed explanation of how Python crawlers use proxy to crawl web pages

高洛峰

Release： 2017-03-19 14:43:46

Original

1978 people have browsed it

Proxy type (proxy): transparent proxy, anonymous proxy, confusion proxy and high-anonymity proxy. Here is some knowledge of pythoncrawlers using proxy, and a proxy pool class. It is convenient for everyone to deal with various aspects of work. A complex crawling problem.

urllib module uses proxy

urllib/urllib2 It is more troublesome to use proxy. You need to build a ProxyHandler class first, and then use this class to build the opener class that opens the web page, and then in the request Install the opener.

The proxy format is "http://127.0.0.1:80". If you want the account password, it is "http://user:password@127.0.0.1:80".

proxy="http://127.0.0.1:80"
# 创建一个ProxyHandler对象
proxy_support=urllib.request.ProxyHandler({&#39;http&#39;:proxy})
# 创建一个opener对象
opener = urllib.request.build_opener(proxy_support)
# 给request装载opener
urllib.request.install_opener(opener)
# 打开一个url
r = urllib.request.urlopen(&#39;http://youtube.com&#39;,timeout = 500)

Copy after login

requests module uses proxy

Using proxy for requests is much simpler than urllib...Here is a single proxy as an example. If it is used multiple times, you can use session to build a class.

If you need to use a proxy, you can configure a single request by providing the proxies parameter to any request method:

import requests
proxies = {
  "http": "http://127.0.0.1:3128",
  "https": "http://127.0.0.1:2080",
}
r=requests.get("http://youtube.com", proxies=proxies)
print r.text

Copy after login

You can also configure the proxy through the environment variables HTTP_PROXY and HTTPS_PROXY.

export HTTP_PROXY="http://127.0.0.1:3128"
export HTTPS_PROXY="http://127.0.0.1:2080"
python
>>> import requests
>>> r=requests.get("http://youtube.com")
>>> print r.text

Copy after login

If your proxy needs to use HTTP Basic Auth, you can use http://user:password@host/ Syntax:

proxies = {
    "http": "http://user:pass@127.0.0.1:3128/",
}

Copy after login

Using python's proxy is very simple. The most important thing is to Find an agent with a stable and reliable network. If you have any questions, please leave a message

The above is the detailed content of Detailed explanation of how Python crawlers use proxy to crawl web pages. For more information, please follow other related articles on the PHP Chinese website!