Home > Backend Development > Python Tutorial > Python crawler uses proxy to crawl web pages

Python crawler uses proxy to crawl web pages

大家讲道理
Release: 2016-11-07 10:59:51
Original
2085 people have browsed it

Proxy type (proxy): transparent proxy, anonymous proxy, obfuscated proxy and high-anonymity proxy. Here is some knowledge about the use of proxies by python crawlers, and a proxy pool class. It is convenient for everyone to deal with various complex crawling problems at work.

urllib module uses proxy

urllib/urllib2 It is more troublesome to use proxy. You need to build a ProxyHandler class first, then use this class to build the opener class that opens the web page, and then install the opener in the request.

Proxy format It is "http://127.0.0.1:80". If you want the account password, it is "http://user:password@127.0.0.1:80".

proxy="http://127.0.0.1:80"

The
# 创建一个ProxyHandler对象
proxy_support=urllib.request.ProxyHandler({'http':proxy})
# 创建一个opener对象
opener = urllib.request.build_opener(proxy_support)
# 给request装载opener
urllib.request.install_opener(opener)
# 打开一个url
r = urllib.request.urlopen('http://youtube.com',timeout = 500)
Copy after login

requests module uses a proxy

Using a proxy for requests is much simpler than urllib... Here we take a single proxy as an example. If you use it multiple times, you can use the session class to build it.

If you need to use a proxy, you can pass any request method Provide proxies parameters to configure individual requests:

import requests
proxies = {
  "http": "http://127.0.0.1:3128",
  "https": "http://127.0.0.1:2080",
}
r=requests.get("http://youtube.com", proxies=proxies)
print r.text
Copy after login

You can also configure proxies through the environment variables HTTP_PROXY and HTTPS_PROXY.

export HTTP_PROXY="http://127.0.0.1:3128"
export HTTPS_PROXY="http://127.0.0.1:2080"
python
>>> import requests
>>> r=requests.get("http://youtube.com")
>>> print r.text
Copy after login

If your proxy needs to use HTTP Basic Auth, you can use http://user:password@host/ Syntax:

proxies = {
    "http": "http://user:pass@127.0.0.1:3128/",
}
Copy after login

Python’s proxy is very simple to use. The most important thing is to find a proxy with a stable and reliable network. If you have any questions, please leave a message

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template