Community Learn Tools Library Leisure

English

Home > Backend Development > PHP Tutorial > PHP regular matching to obtain the hyperlink address of the specified url page_PHP tutorial

PHP regular matching to obtain the hyperlink address of the specified url page_PHP tutorial

WBOY

Release： 2016-07-20 11:16:58

Original

1146 people have browsed it

In data collection and page analysis, it is often necessary to capture the content of a given url page, or the second and third level in-depth page content.

Here is the implementation of a test example for reference only.

The code is as follows:

/*
Match the given page link
return:array match[link,content,all]
*/
function match_links($host, $document) {
$pattern = '/(.*?)/i';
preg_match_all($pattern, $document, $m);
return $m;

preg_match_all("']+))[^ >]*>?(.*?)'isx",$document,$links);
while(list($key,$val) = each($links[2])) {
if(!empty($val))
If(preg_match("/http/",$val)){
$match['link'][] = $val;
}
else {
$match['link'][] = $host . $val;
}
}
while(list($key,$val) = each($links[3])) {
if(!empty($val))
If(preg_match("/http/",$val)){
$match['link'][] = $val;
}
else {
$match['link'][] = $host . $val;
}
}
while(list($key,$val) = each($links[4])) {
if(!empty($val))
$match['content'][] = $val;
}
while(list($key,$val) = each($links[0])) {
if(!empty($val))
$match['all'][] = $val;
}
return $match['link'];
}

/*
Get the page text content from the given url
*/
function get_content_from_url($url) {
$str = @file_get_contents($url);
if(mb_check_encoding($str, "GBK"))
$str = iconv("GBK","UTF-8", $str);
$str = strip_tags($str); // Filter html tags
/*
$str = preg_replace( "@@is", "", $str );
$str = preg_replace( "@@is", "", $str );
$str = preg_replace( "@@is", "", $str );
$str = preg_replace( "@<(.*?)>@is", "", $str );
*/
//Filter non-Chinese characters
preg_match_all('/[x{4e00}-x{9fff}]+/u', $str, $matches);
$str = join(',', $matches[0]);
if(!$str)
Return NULL;

return $str;
}

function get_content($url,$depth) {
if(!$url || $depth < 1)
return false;

while($depth > 1){
$str = @file_get_contents($url);
if(!$str)
Return false;

$parseurl = parse_url($url);
if($parseurl['host'])
$host = $parseurl[scheme] . "://" . $parseurl['host'];

$arrlink = match_links($host,$str);
$arr_url = array_unique($arrlink);

$depth--;
foreach($arr_url as $url){
$content .= get_content($url, $depth); //Recursive call
}
}

$content .= get_content_from_url($url);

return $content;
}

Related labels：

php url and analyze match exist address designation data collection regular Web page Obtain need page

source：php.cn

Previous article：Vulnerabilities on the web, analysis of their principles, and prevention methods_PHP tutorial Next article：Common security vulnerabilities in parsing web file operations (directory and file name detection vulnerabilities_PHP tutorial

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

What is a NullPointerException, and how do I fix it?

2024-10-22 09:46:29
From Novice to Coder: Your Journey Begins with C Fundamentals

2024-10-13 13:53:41
Unlocking Web Development with PHP: A Beginner's Guide

2024-10-12 12:15:51
Demystifying C: A Clear and Simple Path for New Programmers

2024-10-11 22:47:31
Unlock Your Coding Potential: C Programming for Absolute Beginners

2024-10-11 19:36:51
Unleash Your Inner Programmer: C for Absolute Beginners

2024-10-11 15:50:41
Automate Your Life with C: Scripts and Tools for Beginners

2024-10-11 15:07:41
PHP Made Easy: Your First Steps in Web Development

2024-10-11 14:21:21
Build Anything with Python: A Beginner's Guide to Unleashing Your Creativity

2024-10-11 12:59:11
The Key to Coding: Unlocking the Power of Python for Beginners

2024-10-11 12:17:31

Latest Issues

PHP arrays obtained from URL parameters do not behave as expected I have a URL parameter that contains the category ID and I want to treat it as an array li...

From 2024-04-06 22:09:02

0

1

1428

Where should I place CustomLog directive in apache I'm using php:7.2-apachedocker. I need to disable health check url login access log. Based...

From 2024-04-06 22:03:59

0

1

990

What is the format of the variables in the return value? I am a new learner of php. I found a piece of code: if($x<time()){return[false,'error']...

From 2024-04-06 21:55:20

0

1

778

Problems encountered when using opentbs to generate odt files: values of the same key are displayed in the same row instead of separate columns. I'm using a library called OpenTbs to create odt using PHP, I'm using it because columns a...

From 2024-04-06 20:18:18

0

1

483

Group MySQL results by ID for looping over I have a table with flight data in mysql. I'm writing a php code that will group and displ...

From 2024-04-06 17:27:56

0

1

406

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template