This article mainly introduces the sharing of examples about python obtaining proxy IP. It has certain reference value. Now I share it with everyone. Friends in need can refer to it.
Usually when we need to crawl some of our When data is needed, there are always some websites that prohibit repeated visits from the same IP. At this time, we should use a proxy IP to disguise ourselves before each visit so that the "enemy" cannot detect it.
ooooooooooooooOK, let's start happily!
This is the file to get the proxy IP. I modularized them and divided them into three functions
Note: There will be some English comments in the article , for the convenience of writing code, after all, one or two words in English are ok
#!/usr/bin/python
#-*- coding:utf-8 -*-
"""
author:dasuda
"""
import urllib2
import re
import socket
import threading
findIP = [] #获取的原始IP数据
IP_data = [] #拼接端口后的IP数据
IP_data_checked = [] #检查可用性后的IP数据
findPORT = [] #IP对应的端口
available_table = [] #可用IP的索引
def getIP(url_target):
patternIP = re.compile(r'(?<=<td>)[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}')
patternPORT = re.compile(r'(?<=<td>)[\d]{2,5}(?=</td>)')
print "now,start to refresh proxy IP..."
for page in range(1,4):
url = 'http://www.xicidaili.com/nn/'+str(page)
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64)"}
request = urllib2.Request(url=url, headers=headers)
response = urllib2.urlopen(request)
content = response.read()
findIP = re.findall(patternIP,str(content))
findPORT = re.findall(patternPORT,str(content))
#assemble the ip and port
for i in range(len(findIP)):
findIP[i] = findIP[i] + ":" + findPORT[i]
IP_data.extend(findIP)
print('get page', page)
print "refresh done!!!"
#use multithreading
mul_thread_check(url_target)
return IP_data_checked
def check_one(url_check,i):
#get lock
lock = threading.Lock()
#setting timeout
socket.setdefaulttimeout(8)
try:
ppp = {"http":IP_data[i]}
proxy_support = urllib2.ProxyHandler(ppp)
openercheck = urllib2.build_opener(proxy_support)
urllib2.install_opener(openercheck)
request = urllib2.Request(url_check)
request.add_header('User-Agent',"Mozilla/5.0 (Windows NT 10.0; WOW64)")
html = urllib2.urlopen(request).read()
lock.acquire()
print(IP_data[i],'is OK')
#get available ip index
available_table.append(i)
lock.release()
except Exception as e:
lock.acquire()
print('error')
lock.release()
def mul_thread_check(url_mul_check):
threads = []
for i in range(len(IP_data)):
#creat thread...
thread = threading.Thread(target=check_one, args=[url_mul_check,i,])
threads.append(thread)
thread.start()
print "new thread start",i
for thread in threads:
thread.join()
#get the IP_data_checked[]
for error_cnt in range(len(available_table)):
aseemble_ip = {'http': IP_data[available_table[error_cnt]]}
IP_data_checked.append(aseemble_ip)
print "available proxy ip:",len(available_table)1. getIP(url_target): The main function incoming parameters are: the URL to verify the availability of the proxy IP, It is recommended that ipchina
obtain the proxy IP from the http://www.xicidaili.com/nn/ website. It is a website that provides free proxy IP, but not all IPs in it are It can be used, and based on your actual geographical location, network conditions, target server accessed, etc., probably less than 20% can be used, at least in my case.
Use the normal method to access the http://www.xicidaili.com/nn/ website. The returned web page content obtains the required IP and corresponding port through regular query. The code is as follows:
patternIP = re.compile(r'(?<=<td>)[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}')
patternPORT = re.compile(r'(?<=<td>)[\d]{2,5}(?=</td>)')
...
findIP = re.findall(patternIP,str(content))
findPORT = re.findall(patternPORT,str(content))About How to construct a regular expression, you can refer to other articles:
The obtained IP is stored in findIP, and the corresponding port is in findPORT. The two correspond to each other by index. The normal number of IPs obtained on a page is 100.
Next, IP and port splicing
Finally, availability check
2. check_one(url_check,i): thread function
This visit to url_check is still done in the normal way. When the web page is returned, it means that the proxy IP is available, and the current index value is recorded, which will be used to extract all available IPs later.
3. mul_thread_check(url_mul_check): Multi-thread generation
This function enables multi-threading to check the proxy IP availability, and each IP opens a thread Check.
This project directly calls getIP() and passes in the URL used to check availability, and then a list is returned, which is a list of IPs that have been checked for availability, in the format of
['ip1:port1','ip2:port2',....]
Related recommendations :
Instance of Python crawler grabbing proxy IP and checking availability
Python method to collect proxy IP and determine whether it is available and update it regularly
The above is the detailed content of Example sharing of python obtaining proxy IP. For more information, please follow other related articles on the PHP Chinese website!
The Main Purpose of Python: Flexibility and Ease of UseApr 17, 2025 am 12:14 AMPython's flexibility is reflected in multi-paradigm support and dynamic type systems, while ease of use comes from a simple syntax and rich standard library. 1. Flexibility: Supports object-oriented, functional and procedural programming, and dynamic type systems improve development efficiency. 2. Ease of use: The grammar is close to natural language, the standard library covers a wide range of functions, and simplifies the development process.
Python: The Power of Versatile ProgrammingApr 17, 2025 am 12:09 AMPython is highly favored for its simplicity and power, suitable for all needs from beginners to advanced developers. Its versatility is reflected in: 1) Easy to learn and use, simple syntax; 2) Rich libraries and frameworks, such as NumPy, Pandas, etc.; 3) Cross-platform support, which can be run on a variety of operating systems; 4) Suitable for scripting and automation tasks to improve work efficiency.
Learning Python in 2 Hours a Day: A Practical GuideApr 17, 2025 am 12:05 AMYes, learn Python in two hours a day. 1. Develop a reasonable study plan, 2. Select the right learning resources, 3. Consolidate the knowledge learned through practice. These steps can help you master Python in a short time.
Python vs. C : Pros and Cons for DevelopersApr 17, 2025 am 12:04 AMPython is suitable for rapid development and data processing, while C is suitable for high performance and underlying control. 1) Python is easy to use, with concise syntax, and is suitable for data science and web development. 2) C has high performance and accurate control, and is often used in gaming and system programming.
Python: Time Commitment and Learning PaceApr 17, 2025 am 12:03 AMThe time required to learn Python varies from person to person, mainly influenced by previous programming experience, learning motivation, learning resources and methods, and learning rhythm. Set realistic learning goals and learn best through practical projects.
Python: Automation, Scripting, and Task ManagementApr 16, 2025 am 12:14 AMPython excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.
Python and Time: Making the Most of Your Study TimeApr 14, 2025 am 12:02 AMTo maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.
Python: Games, GUIs, and MoreApr 13, 2025 am 12:14 AMPython excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Zend Studio 13.0.1
Powerful PHP integrated development environment

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

WebStorm Mac version
Useful JavaScript development tools






