How to Efficiently Extract Page Content Using cURL with Error Handling?-PHP Tutorial-php.cn

How to Efficiently Extract Page Content Using cURL with Error Handling?

DDD

Release： 2024-10-22 20:34:26

Original

366 people have browsed it

How to Efficiently Extract Page Content Using cURL with Error Handling?

How to Extract Page Content Using cURL: A Detailed Solution

Understanding the Issue

When attempting to scrape the HTML content of a web page using cURL, it's common to encounter redirections or "page moved" errors. This can often be attributed to specially encoded characters in the query string.

Optimization for cURL:

To effectively retrieve the page content without encountering these issues, optimize your cURL code as follows:

<code class="php">function get_web_page($url) {
    $user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';

    $options = array(
        CURLOPT_CUSTOMREQUEST  => "GET",
        CURLOPT_POST           => false,
        CURLOPT_USERAGENT      => $user_agent,
        CURLOPT_COOKIEFILE     => "cookie.txt", 
        CURLOPT_COOKIEJAR      => "cookie.txt",
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_HEADER         => false,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_ENCODING       => "",
        CURLOPT_AUTOREFERER    => true,
        CURLOPT_CONNECTTIMEOUT => 120,
        CURLOPT_TIMEOUT        => 120,
        CURLOPT_MAXREDIRS      => 10,
    );

    $ch      = curl_init($url);
    curl_setopt_array($ch, $options);
    $content = curl_exec($ch);
    $err     = curl_errno($ch);
    $errmsg  = curl_error($ch);
    $header  = curl_getinfo($ch);
    curl_close($ch);

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}</code>

Copy after login

Example:

Retrieve and handle potential errors while reading the page:

<code class="php">$result = get_web_page($url);

if ($result['errno'] != 0)
    // Error handling for invalid URL, timeout, or redirect loops.

if ($result['http_code'] != 200)
    // Error handling for issues like missing page, permission denial, or unavailability.

$page = $result['content'];</code>

Copy after login

The above is the detailed content of How to Efficiently Extract Page Content Using cURL with Error Handling?. For more information, please follow other related articles on the PHP Chinese website!