Community Learn Tools Library Leisure

English

Home > Backend Development > PHP Tutorial > The solution to crawling garbled web pages using curl and file_get_contents

The solution to crawling garbled web pages using curl and file_get_contents

巴扎黑

Release： 2016-11-09 11:23:40

Original

1277 people have browsed it

When I used the curl_init function to crawl Sohu's web pages today, I found that the collected web pages were garbled. After analysis, I found that the server turned on the gzip compression function. Just add multiple options CURLOPT_ENCODING to the function curl_setopt to parse gzip and you can decode it correctly.

Also, if the captured web page is encoded in GBK, but the script is indeed encoded in utf-8, the captured web page must be converted using the function mb_convert_encoding.

<?php
    $tmp = sys_get_temp_dir();
    $cookieDump = tempnam($tmp, &#39;cookies&#39;);
    $url = &#39;http://tv.sohu.com&#39;;
    $ch = curl_init();
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_HEADER, 1);// 显示返回的Header区域内容
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); // 使用自动跳转
    curl_setopt ($ch, CURLOPT_TIMEOUT, 10);// 设置超时限制
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); // 获取的信息以文件流的形式返回
    curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,10);// 链接超时限制
    curl_setopt ($ch, CURLOPT_HTTPHEADER,array(&#39;Accept-Encoding: gzip, deflate&#39;));//设置 http 头信息
    curl_setopt ($ch, CURLOPT_ENCODING, &#39;gzip,deflate&#39;);//添加 gzip 解码的选项，即使网页没启用 gzip 也没关系
    curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookieDump);  // 存放Cookie信息的文件名称
    $content = curl_exec($ch);
    // 把抓取的网页由 GBK 转换成 UTF-8 
    $content = mb_convert_encoding($content,"UTF-8","GBK");
?>

Copy after login

rrree

Related labels：

php

source：php.cn

Previous article：PHP sets the browser cache of dynamic web pages Next article：php recursive formatting number type

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

How to add elements to php array

2023-03-14 15:58:02
Example showing JS implementing a simple multiple-choice assessment system

1970-01-01 08:00:00
PHP solution to restrict multiple submissions of the same IP

2023-03-15 07:38:01
Using regular expressions to implement form validation in HTML

1970-01-01 08:00:00
Detailed explanation of this pointing issue in JavaScript strict mode

1970-01-01 08:00:00
Example code for building a tree menu (including multi-level menu) in Java

1970-01-01 08:00:00
Detailed explanation of examples of CSS3 implementing smooth transition when hover leaves

1970-01-01 08:00:00
Swiper carousel image source code sharing analysis

1970-01-01 08:00:00
Summarize and organize VsCode plug-ins

1970-01-01 08:00:00
HttpUtils request tool class code

1970-01-01 08:00:00

Latest Issues

PHP arrays obtained from URL parameters do not behave as expected I have a URL parameter that contains the category ID and I want to treat it as an array li...

From 2024-04-06 22:09:02

0

1

1428

Where should I place CustomLog directive in apache I'm using php:7.2-apachedocker. I need to disable health check url login access log. Based...

From 2024-04-06 22:03:59

0

1

990

What is the format of the variables in the return value? I am a new learner of php. I found a piece of code: if($x<time()){return[false,'error']...

From 2024-04-06 21:55:20

0

1

778

Problems encountered when using opentbs to generate odt files: values of the same key are displayed in the same row instead of separate columns. I'm using a library called OpenTbs to create odt using PHP, I'm using it because columns a...

From 2024-04-06 20:18:18

0

1

483

Group MySQL results by ID for looping over I have a table with flight data in mysql. I'm writing a php code that will group and displ...

From 2024-04-06 17:27:56

0

1

406

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template