Home  >  Article  >  Backend Development  >  Tutorial on capturing remote images with PHP

Tutorial on capturing remote images with PHP

高洛峰
高洛峰Original
2016-10-20 13:48:511178browse

When I was doing WeChat login development before, I found that WeChat avatar pictures did not have a suffix name. The traditional image capture method did not work and required special capture processing. Therefore, various situations were later combined, encapsulated into a class, and shared.

Create project

As a demonstration, we create the project grabimg in the www root directory, create a class GrabImage.php and an index.php.

Write class code

We define a class with the same name as the file: GrabImage

class GrabImage{}

Attributes

Next define several attributes that need to be used.

1. First define an image address that needs to be captured: $img_url

2. Then define a $file_name to store the name of the file, but does not carry the extension name, because it may involve extension name replacement, so I will unpack it here. Definition

3, followed by the extension name $extension

4, and then we define a $file_dir. The function of this attribute is to store the directory where the remote image is captured to the local, generally relative to the location of the PHP entry file. as a starting point. However, this path is generally not saved to the database.

5. Finally, we define a $save_dir. As the name suggests, this path is the directory of the database used to save directly. To explain here, we do not directly store the file saving path in the database. This is usually to prepare for the convenience of changing the path if the system is migrated later. Our $save_dir here is generally the date + file name. If you need to use it, take it out and spell the required path in front.

Method

Attributes are finished, now we officially start the crawling work.

First we define an open method getInstances to obtain some data, such as the address of the captured image and the local save path. Also put it in properties.

public function getInstances($img_url , $base_dir)
{
    $this->img_url = $img_url;
    $this->save_dir = date("Ym").'/'.date("d").'/'; // 比如:201610/19/
    $this->file_dir = $base_dir.'/'.$this->save_dir.'/'; // 比如:./uploads/image/2016/10/19/
}

The picture saving path is spliced. Now we have to pay attention to a question, whether the directory exists. The date goes by day by day, but the directory is not automatically created. Therefore, before saving the image, you need to check it first. If the current directory does not exist, we need to create it immediately.

We create the set directory method setDir. We set the attribute to private, which is safe

/**
 * 检查图片需要保持的目录是否存在
 * 如果不存在,则立即创建一个目录
 * @return bool
 */
private function setDir()
{
    if(!file_exists($this->file_dir))
    {
        mkdir($this->file_dir,0777,TRUE);
    }

    $this->file_name = uniqid().rand(10000,99999);// 文件名,这里只是演示,实际项目中请使用自己的唯一文件名生成方法

    return true;
}

The next step is to capture the core code

The first step is to solve a problem. The images we need to capture may not have a suffix name. According to the traditional crawling method, it is not feasible to crawl the image first and then intercept the suffix name.

We must get the image type through other methods. The method is to obtain the file header information from the file stream information, thereby judging the file mime information, and then you can know the file suffix name.

For convenience, first define a mime and file extension mapping.

$mimes=array(
    'image/bmp'=>'bmp',
    'image/gif'=>'gif',
    'image/jpeg'=>'jpg',
    'image/png'=>'png',
    'image/x-icon'=>'ico'
);

In this way, when I get the type of image/gif, I can know that it is a .gif picture.

Use the php function get_headers to obtain file stream header information. When its value is not false, we assign it to the variable $headers

and take out the value of Content-Type which is the value of mime.

if(($headers=get_headers($this->img_url, 1))!==false){
    // 获取响应的类型
    $type=$headers['Content-Type'];
}

Using the mapping table we defined above, we can easily obtain the suffix name.

$this->extension=$mimes[$type];

Of course, the $type obtained above may not exist in our mapping table, which means that this type of file is not what we want. Just discard it and ignore it.

The following steps are the same as traditional grabbing files.

$file_path = $this->file_dir.$this->file_name.".".$this->extension;
// 获取数据并保存
$contents=file_get_contents($this->img_url);
if(file_put_contents($file_path , $contents))
{
    // 这里返回出去的值是直接保存到数据库的路径 + 文件名,形如:201610/19/57feefd7e2a7aY5p7LsPqaI-lY1BF.jpg
    return $this->save_dir.$this->file_name.".".$this->extension;
}

First get the full path of the local saved image $file_path, then use file_get_contents to grab the data, and then use file_put_contents to save to the file path just now.

Finally we return a path that can be saved directly to the database instead of a file storage path.

The complete version of this crawling method is:

private function getRemoteImg()
{
    // mime 和 扩展名 的映射
    $mimes=array(
        'image/bmp'=>'bmp',
        'image/gif'=>'gif',
        'image/jpeg'=>'jpg',
        'image/png'=>'png',
        'image/x-icon'=>'ico'
    );
    // 获取响应头
    if(($headers=get_headers($this->img_url, 1))!==false)
    {
        // 获取响应的类型
        $type=$headers['Content-Type'];
        // 如果符合我们要的类型
        if(isset($mimes[$type]))
        {
            $this->extension=$mimes[$type];
            $file_path = $this->file_dir.$this->file_name.".".$this->extension;
            // 获取数据并保存
            $contents=file_get_contents($this->img_url);
            if(file_put_contents($file_path , $contents))
            {
                // 这里返回出去的值是直接保存到数据库的路径 + 文件名,形如:201610/19/57feefd7e2a7aY5p7LsPqaI-lY1BF.jpg
                return $this->save_dir.$this->file_name.".".$this->extension;
            }
        }
    }
    return false;
}

最后,为了简单,我们想在其他地方只要调用其中一个方法就可以完成抓取。所以,我们将抓取动作直接放入到getInstances中,在配置完路径后,直接抓取,所以,在初始化配置方法getInstances里新增代码。

if($this->setDir())
{
    return $this->getRemoteImg();
}
else
{
    return false;
}

测试

我们去刚刚创建的index.php文件内试试。

getInstances($img_url , $base_dir);
?>

惹,的确抓取过来了

Tutorial on capturing remote images with PHP

完整代码


 * @link bidianer.com
 */
class GrabImage{

    /**
     * @var string 需要抓取的远程图片的地址
     * 例如:http://www.bidianer.com/img/icon_mugs.jpg
     * 有一些远程文件路径可能不带拓展名
     * 形如:http://www.xxx.com/img/icon_mugs/q/0
     */
    private $img_url;

    /**
     * @var string 需要保存的文件名称
     * 抓取到本地的文件名会重新生成名称
     * 但是,不带拓展名
     * 例如:57feefd7e2a7aY5p7LsPqaI-lY1BF
     */
    private $file_name;

    /**
     * @var string 文件的拓展名
     * 这里直接使用远程图片拓展名
     * 对于没有拓展名的远程图片,会从文件流中获取
     * 例如:.jpg
     */
    private $extension;

    /**
     * @var string 文件保存在本地的目录
     * 这里的路径是PHP保存文件的路径
     * 一般相对于入口文件保存的路径
     * 比如:./uploads/image/201610/19/
     * 但是该路径一般不直接存储到数据库
     */
    private $file_dir;

    /**
     * @var string 数据库保存的文件目录
     * 这个路径是直接保存到数据库的图片路径
     * 一般直接保存日期 + 文件名,需要使用的时候拼上前面路径
     * 这样做的目的是为了迁移系统时候方便更换路径
     * 例如:201610/19/
     */
    private $save_dir;

    /**
     * @param string $img_url 需要抓取的图片地址
     * @param string $base_dir 本地保存的路径,比如:./uploads/image,最后不带斜杠"/"
     * @return bool|int
     */
    public function getInstances($img_url , $base_dir)
    {
        $this->img_url = $img_url;
        $this->save_dir = date("Ym").'/'.date("d").'/'; // 比如:201610/19/
        $this->file_dir = $base_dir.'/'.$this->save_dir.'/'; // 比如:./uploads/image/2016/10/19/
        return $this->start();
    }

    /**
     * 开始抓取图片
     */
    private function start()
    {
        if($this->setDir())
        {
            return $this->getRemoteImg();
        }
        else
        {
            return false;
        }
    }

    /**
     * 检查图片需要保持的目录是否存在
     * 如果不存在,则立即创建一个目录
     * @return bool
     */
    private function setDir()
    {
        if(!file_exists($this->file_dir))
        {
            mkdir($this->file_dir,0777,TRUE);
        }

        $this->file_name = uniqid().rand(10000,99999);// 文件名,这里只是演示,实际项目中请使用自己的唯一文件名生成方法

        return true;
    }

    /**
     * 抓取远程图片核心方法,可以同时抓取有后缀名的图片和没有后缀名的图片
     *
     * @return bool|int
     */
    private function getRemoteImg()
    {
        // mime 和 扩展名 的映射
        $mimes=array(
            'image/bmp'=>'bmp',
            'image/gif'=>'gif',
            'image/jpeg'=>'jpg',
            'image/png'=>'png',
            'image/x-icon'=>'ico'
        );
        // 获取响应头
        if(($headers=get_headers($this->img_url, 1))!==false)
        {
            // 获取响应的类型
            $type=$headers['Content-Type'];
            // 如果符合我们要的类型
            if(isset($mimes[$type]))
            {
                $this->extension=$mimes[$type];
                $file_path = $this->file_dir.$this->file_name.".".$this->extension;
                // 获取数据并保存
                $contents=file_get_contents($this->img_url);
                if(file_put_contents($file_path , $contents))
                {
                    // 这里返回出去的值是直接保存到数据库的路径 + 文件名,形如:201610/19/57feefd7e2a7aY5p7LsPqaI-lY1BF.jpg
                    return $this->save_dir.$this->file_name.".".$this->extension;
                }
            }
        }
        return false;
    }
}


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn