


Take stock of the differences between the urllib library and requests library in Python
1. Preface
When using a Python crawler, you need to simulate initiating network requests. The main libraries used are the requests library and python built-in For the urllib library, it is generally recommended to use requests, which is a re-encapsulation of urllib.
What is the difference between them?
The following is a detailed explanation of the case to understand the main differences in their use.
2. urllib library
Introduction:The response object of the urllib library is to first create the http and request objects, and then load them into reques.urlopen. http request.
What is returned is http, response object, which is actually html attribute. Use .read().decode() to decode and convert it into str string type. After decoding, Chinese characters can be displayed.
Example:
from urllib import request #请求头 headers = { "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36' } wd = {"wd": "中国"} url = "http://www.baidu.com/s?" req = request.Request(url, headers=headers) response = request.urlopen(req) print(type(response)) print(response) res = response.read().decode() print(type(res)) print(res)
##Run result:
##Note:
Usually crawl web pages, when constructing http When making a request, you need to add some additional information, such as Useragent, cookie, etc., or add a proxy server. Often these are necessary anti-crawling mechanisms.
## 简介:requests库调用是requests.get方法传入url和参数,返回的对象是Response对象,打印出来是显示响应状态码。 通过.text 方法可以返回是unicode 型的数据,一般是在网页的header中定义的编码形式,而content返回的是bytes,二级制型的数据,还有 .json方法也可以返回json字符串。 如果想要提取文本就用text,但是如果你想要提取图片、文件等二进制文件,就要用content,当然decode之后,中文字符也会正常显示。 requests的优势:Python爬虫时,更建议用requests库。因为requests比urllib更为便捷,requests可以直接构造get,post请求并发起,而urllib.request只能先构造get,post请求,再发起。 例: 运行结果 (可以直接获取整网页的信息,打印控制台): 1. 本文基于Python基础,主要介绍了urllib库和requests库的区别。 2. 在使用urllib内的request模块时,返回体获取有效信息和请求体的拼接需要decode和encode后再进行装载。进行http请求时需先构造get或者post请求再进行调用,header等头文件也需先进行构造。 3. requests是对urllib的进一步封装,因此在使用上显得更加的便捷,建议在实际应用当中尽量使用requests。 4. 希望能给一些对爬虫感兴趣,有一个具体的概念。方法只是一种工具,试着去爬一爬会更容易上手,网络也会有很多的坑,做爬虫更需要大量的经验来应付复杂的网络情况。 5. 希望大家一起探讨学习, 一起进步。 The above is the detailed content of Take stock of the differences between the urllib library and requests library in Python. For more information, please follow other related articles on the PHP Chinese website!三、requests库
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Linux; U; Android 8.1.0; zh-cn; BLA-AL00 Build/HUAWEIBLA-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQQBrowser/8.9 Mobile Safari/537.36"
}
wd = {"wd": "中国"}
url = "http://www.baidu.com/s?"
response = requests.get(url, params=wd, headers=headers)
data = response.text
data2 = response.content
print(response)
print(type(response))
print(data)
print(type(data))
print(data2)
print(type(data2))
print(data2.decode())
print(type(data2.decode()))
四、总结

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Install pyodbc: Use the pipinstallpyodbc command to install the library; 2. Connect SQLServer: Use the connection string containing DRIVER, SERVER, DATABASE, UID/PWD or Trusted_Connection through the pyodbc.connect() method, and support SQL authentication or Windows authentication respectively; 3. Check the installed driver: Run pyodbc.drivers() and filter the driver name containing 'SQLServer' to ensure that the correct driver name is used such as 'ODBCDriver17 for SQLServer'; 4. Key parameters of the connection string

Pythoncanbeoptimizedformemory-boundoperationsbyreducingoverheadthroughgenerators,efficientdatastructures,andmanagingobjectlifetimes.First,usegeneratorsinsteadofliststoprocesslargedatasetsoneitematatime,avoidingloadingeverythingintomemory.Second,choos

shutil.rmtree() is a function in Python that recursively deletes the entire directory tree. It can delete specified folders and all contents. 1. Basic usage: Use shutil.rmtree(path) to delete the directory, and you need to handle FileNotFoundError, PermissionError and other exceptions. 2. Practical application: You can clear folders containing subdirectories and files in one click, such as temporary data or cached directories. 3. Notes: The deletion operation is not restored; FileNotFoundError is thrown when the path does not exist; it may fail due to permissions or file occupation. 4. Optional parameters: Errors can be ignored by ignore_errors=True

Use psycopg2.pool.SimpleConnectionPool to effectively manage database connections and avoid the performance overhead caused by frequent connection creation and destruction. 1. When creating a connection pool, specify the minimum and maximum number of connections and database connection parameters to ensure that the connection pool is initialized successfully; 2. Get the connection through getconn(), and use putconn() to return the connection to the pool after executing the database operation. Constantly call conn.close() is prohibited; 3. SimpleConnectionPool is thread-safe and is suitable for multi-threaded environments; 4. It is recommended to implement a context manager in combination with context manager to ensure that the connection can be returned correctly when exceptions are noted;

Introduction to Statistical Arbitrage Statistical Arbitrage is a trading method that captures price mismatch in the financial market based on mathematical models. Its core philosophy stems from mean regression, that is, asset prices may deviate from long-term trends in the short term, but will eventually return to their historical average. Traders use statistical methods to analyze the correlation between assets and look for portfolios that usually change synchronously. When the price relationship of these assets is abnormally deviated, arbitrage opportunities arise. In the cryptocurrency market, statistical arbitrage is particularly prevalent, mainly due to the inefficiency and drastic fluctuations of the market itself. Unlike traditional financial markets, cryptocurrencies operate around the clock and their prices are highly susceptible to breaking news, social media sentiment and technology upgrades. This constant price fluctuation frequently creates pricing bias and provides arbitrageurs with

iter() is used to obtain the iterator object, and next() is used to obtain the next element; 1. Use iterator() to convert iterable objects such as lists into iterators; 2. Call next() to obtain elements one by one, and trigger StopIteration exception when the elements are exhausted; 3. Use next(iterator, default) to avoid exceptions; 4. Custom iterators need to implement the __iter__() and __next__() methods to control iteration logic; using default values is a common way to safe traversal, and the entire mechanism is concise and practical.

Install the corresponding database driver; 2. Use connect() to connect to the database; 3. Create a cursor object; 4. Use execute() or executemany() to execute SQL and use parameterized query to prevent injection; 5. Use fetchall(), etc. to obtain results; 6. Commit() is required after modification; 7. Finally, close the connection or use a context manager to automatically handle it; the complete process ensures that SQL operations are safe and efficient.

threading.Timer executes functions asynchronously after a specified delay without blocking the main thread, and is suitable for handling lightweight delays or periodic tasks. ①Basic usage: Create Timer object and call start() method to delay execution of the specified function; ② Cancel task: Calling cancel() method before the task is executed can prevent execution; ③ Repeating execution: Enable periodic operation by encapsulating the RepeatingTimer class; ④ Note: Each Timer starts a new thread, and resources should be managed reasonably. If necessary, call cancel() to avoid memory waste. When the main program exits, you need to pay attention to the influence of non-daemon threads. It is suitable for delayed operations, timeout processing, and simple polling. It is simple but very practical.
