This article mainly introduces the method of crawling web content and images with PHP. It has certain reference value. Now I share it with you. Friends in need can refer to it.
Example 1:
<?php include_once("curl.php");/*这个文件要自己配置好*/
header("content-type:text/html;charset=utf8");
$pattern_title ="/<title>(.+)/title>/";//标题匹配
$pattern_code = "/<tr><td>
<p>(.+)/p><script></script>";
}
echo "</p>
<hr>";
/*$trans = array(" "=>",", "<br>"=>"。");
$TRANS_CONTENT = strtr($DATA_CONTENT, $trans);
echo $TRANS_CONTENT;
*/
$DATA_CONTENT=preg_replace('//s(?=/s)/', ' ', $DATA_CONTENT);//(?=pattern) 正向预查
$DATA_CONTENT=preg_replace('/[/n/r/t]/', '/r/n', $DATA_CONTENT);//回车换行
$DATA_CONTENT=preg_replace('/ /', ' ', $DATA_CONTENT);
$num=preg_match_all($pattern_code, $DATA_CONTENT, $match_code);
for($i=0;$i";
}
?><p>Example 2:<br></p><pre class='brush:php;toolbar:false;'> <?php
/*
author: ssh_kobe
date: 20110615
*/
set_time_limit(0);//抓取不受时间限制
function get_pic($pic_url) {
//获取图片二进制流
$data=CurlGet($pic_url);
/*利用正则表达式得到图片链接
$pattern_src = '/<img .*?\"([^\"]*(jpg|bmp|jpeg|gif)).*? alt="How to crawl web content and images in php" >/';*/
$pattern_src = '/<img .*?src\=\"(.*\.jpg).*? alt="How to crawl web content and images in php" >/';//只匹配jpg格式的图片
$num = preg_match_all($pattern_src, $data, $match_src);
$arr_src=$match_src[1];//获得图片数组
//get_name($arr_src);
get_name_2($arr_src);
echo 'End!!!<br>';
return 0;
}
function get_pic_2($pic_url, $base_site) {
//获取图片二进制流
$data=CurlGet($pic_url);
/*利用正则表达式得到图片链接*/
$pattern_src = '/<img .*?\"([^\"]*jpg).*? alt="How to crawl web content and images in php" >/';//只匹配jpg格式的图片
$num = preg_match_all($pattern_src, $data, $match_src);
$arr_src=$match_src[1];//获得图片数组
$arr_src=rev_site($arr_src, $base_site);
get_name($arr_src);
echo 'End!!!<br>';
return 0;
}
/* 将图片相对地址改为绝对地址 */
function rev_site($site_list, $base_site){
foreach($site_list as $site_item) {
if (preg_match('/^http/', $site_item)) {
$return_list[] = $site_item;
}else{
$return_list[] = $base_site.$site_item;
}
}
return $return_list;
}
/*得到图片类型,并将其保存到与该文件同一目录*/
function get_name($pic_arr)
{
//图片类型
$pattern_type = '/(\.(jpg|bmp|jpeg|gif|png))/';
foreach($pic_arr as $pic_item){//循环取出每幅图的地址
$num = preg_match_all($pattern_type,$pic_item,$match_type);
$pic_name = get_unique().$match_type[1][0];//改时微秒时间戳命名
//以流的形式保存图片
$write_fd = @fopen($pic_name,"wb");
@fwrite($write_fd, CurlGet($pic_item));
@fclose($write_fd);
echo "OK..";
}
return 0;
}
function get_name_2($pic_arr)
{
//图片编号和类型
$pattern_type = '/.*\/(.*?)$/';
foreach($pic_arr as $pic_item){//循环取出每幅图的地址
$num = preg_match_all($pattern_type,$pic_item,$match_type);
//以流的形式保存图片
$write_fd = @fopen($match_type[1][0],"wb");
@fwrite($write_fd, CurlGet($pic_item));
@fclose($write_fd);
echo 'OK..';
}
return 0;
}
//通过微秒时间获得唯一ID
function get_unique(){
list($msec, $sec) = explode(" ",microtime());
return $sec.intval($msec*1000000);
}
//抓取网页内容
function CurlGet($url){
$url=str_replace('&','&',$url);
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HEADER, false);
//curl_setopt($curl, CURLOPT_REFERER,$url);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 6.0; SeaPort/1.2; Windows NT 5.1; SV1; InfoPath.2)");
curl_setopt($curl, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 0);
$values = curl_exec($curl);
curl_close($curl);
return $values;
}
?>The above is the entire content of this article. I hope it will be helpful to everyone’s learning. For more related content, please pay attention to the PHP Chinese website!
Related recommendations:
How to package zip image downloads in php
php web request security processing
The above is the detailed content of How to crawl web content and images in php. For more information, please follow other related articles on the PHP Chinese website!
PHP's Purpose: Building Dynamic WebsitesApr 15, 2025 am 12:18 AMPHP is used to build dynamic websites, and its core functions include: 1. Generate dynamic content and generate web pages in real time by connecting with the database; 2. Process user interaction and form submissions, verify inputs and respond to operations; 3. Manage sessions and user authentication to provide a personalized experience; 4. Optimize performance and follow best practices to improve website efficiency and security.
PHP: Handling Databases and Server-Side LogicApr 15, 2025 am 12:15 AMPHP uses MySQLi and PDO extensions to interact in database operations and server-side logic processing, and processes server-side logic through functions such as session management. 1) Use MySQLi or PDO to connect to the database and execute SQL queries. 2) Handle HTTP requests and user status through session management and other functions. 3) Use transactions to ensure the atomicity of database operations. 4) Prevent SQL injection, use exception handling and closing connections for debugging. 5) Optimize performance through indexing and cache, write highly readable code and perform error handling.
How do you prevent SQL Injection in PHP? (Prepared statements, PDO)Apr 15, 2025 am 12:15 AMUsing preprocessing statements and PDO in PHP can effectively prevent SQL injection attacks. 1) Use PDO to connect to the database and set the error mode. 2) Create preprocessing statements through the prepare method and pass data using placeholders and execute methods. 3) Process query results and ensure the security and performance of the code.
PHP and Python: Code Examples and ComparisonApr 15, 2025 am 12:07 AMPHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.
PHP in Action: Real-World Examples and ApplicationsApr 14, 2025 am 12:19 AMPHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.
PHP: Creating Interactive Web Content with EaseApr 14, 2025 am 12:15 AMPHP makes it easy to create interactive web content. 1) Dynamically generate content by embedding HTML and display it in real time based on user input or database data. 2) Process form submission and generate dynamic output to ensure that htmlspecialchars is used to prevent XSS. 3) Use MySQL to create a user registration system, and use password_hash and preprocessing statements to enhance security. Mastering these techniques will improve the efficiency of web development.
PHP and Python: Comparing Two Popular Programming LanguagesApr 14, 2025 am 12:13 AMPHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.
The Enduring Relevance of PHP: Is It Still Alive?Apr 14, 2025 am 12:12 AMPHP is still dynamic and still occupies an important position in the field of modern programming. 1) PHP's simplicity and powerful community support make it widely used in web development; 2) Its flexibility and stability make it outstanding in handling web forms, database operations and file processing; 3) PHP is constantly evolving and optimizing, suitable for beginners and experienced developers.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

Zend Studio 13.0.1
Powerful PHP integrated development environment

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.






