Home > Article > Backend Development > Can crawler technology crawl https?

Can crawler technology crawl https?

silencementOriginal: 2019-05-29 13:55:236674browse

Can crawler technology crawl https?

First of all, let’s understand what https is

https is HTTP SSL In short, the previous plaintext is encrypted and transmitted based on the HTTP transmission method. The information encryption method and secret key are determined before transmission. Even if it is captured or forged during transmission, it can ensure that the information is not leaked.

The essence of the crawler is to pretend to be a browser, send a request to the server, and participate in the entire process, so even https links can be crawled, but the premise is that the forged client has the correct SSL certificate.

Find the source of the error

When the crawler is running and an SSL error is prompted, it is usually because the local certificate or related SSL library is not installed correctly, and the server uses its own CA certificate, which is not certified by an authoritative organization.

Solving certificate exception issues

For CA certificate issues we can refer to the following centralized solutions:

1. Do not verify the CA certificate, but ignore security Warning

coding=utf-8import requests# 不验证CA证书则需要忽略安全警告方式一：import urllib3urllib3.disable_warnings()方式二：from requests.packages.urllib3.exceptions import InsecureRequestWarningrequests.packages.urllib3.disable_warnings(InsecureRequestWarning)r=requests.get(url=“https://www.baidu.com/”,verify=False)print r.elapsed.total_seconds()

2. Specify the certificate location or the folder containing the certificate (this folder is made by the OpenSSL tool)

coding=utf-8import requestsr=requests.get(url=“https://www.baidu.com/”,verify=&#39;/path/to/certfile&#39;)

The above is the detailed content of Can crawler technology crawl https?. For more information, please follow other related articles on the PHP Chinese website!

Error http https ssl

Statement：

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Previous article：How to use qpython3lNext article：How to use qpython3l

See more

Can crawler technology crawl https?

Related articles