Home >Backend Development >PHP Tutorial >Code example for php to crawl images and save them locally

Code example for php to crawl images and save them locally

不言
不言forward
2019-01-28 09:51:043982browse

This article brings you code examples about crawling images with PHP and saving them locally. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.

Review the usage of several php functions through a simple example

Used functions or knowledge points

  • curl sends network requests

  • preg_match Regular match

Code

$url     = 'http://desk.zol.com.cn/bizhi/7386_91671_2.html';
$headers = [
    'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
];
$ch      = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);     //将curl_exec()获取的信息以字符串返回,而不是直接输出。
curl_setopt($ch, CURLOPT_HEADER, $headers);
$output = curl_exec($ch);
curl_close($ch);
$str = mb_convert_encoding($output, 'utf-8', 'gb2312');
//或$str = iconv('gb2312//IGNORE', 'utf-8', $output);

preg_match(&#39;!<img id="bigImg" src="(?<src>http.*\.(?<ext>jpg|png))".*>!&#39;, $str, $m);
file_put_contents(&#39;./meinv.&#39; . $m[&#39;ext&#39;], file_get_contents($m[&#39;src&#39;]));

Effect

Code example for php to crawl images and save them locally

Explanation

curl sends a request

The steps to establish a curl connection in PHP are generally: initialization, setting options, performing operations, and releasing the connection.

$ch = curl_init();
curl_setopt($ch, CURLOPT, $opt);
$out = curl_exec($ch);
curl_close();

Commonly used CURLOPT settings, more reference documents http://php.net/manual/zh/function.curl-setopt.php

CURLOPT_URL, string //设置url必须
CURLOPT_HEADER, array //设置请求header
CURLOPT_RETURNTRANSFER, bool //为true时,以字符串返回响应,不包含header
CURLOPT_SSL_VERIFYPEER, bool //为false时,不验证https证书,用于请求https的url
CURLOPT_POST, int //为1时配合CURLOPT_POSTFIELDS使用post请求,默认使用get
CURLOPT_POSTFIELDS, array //post数据数组

Direct output Garbled characters were found in $output. By checking the source code, we found that the web page uses gb2312 encoding. Use mb_convert_encoding or iconv to convert it to utf-8 encoding for output.

preg_match Regular match

By looking at the source code, we found that the image tag we need is 4753a14a50491eb8fdd0e2350de2de58

Regular Expression

<img id="bigImg" src="(?<src>http.*\.(?<ext>jpg|png))".*>

.* Match all, (?8a11bc632ea32a57b3e3693c7987c420) Using grouping, you can easily use $match['name'] to get the desired part

Finally $match['src'] Get the real URL of the image and save it through file_put_contents, even if it is completed


The above is the detailed content of Code example for php to crawl images and save them locally. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:cnblogs.com. If there is any infringement, please contact admin@php.cn delete