Home > Backend Development > PHP Tutorial > PHP采集网页信息

PHP采集网页信息

WBOY
Release: 2016-06-13 12:53:33
Original
986 people have browsed it

【求助】PHP采集网页信息
需要采集 
http://bbs.zhanzhang.baidu.com/home.php?mod=space&uid=*
不同UID的用户名
不熟悉PHP以及正则,只是单纯的想要结果,所以麻烦直接给出PHP代码和前1000个UID的用户名,谢谢。

php
------解决方案--------------------
$html = file_get_contents('http://bbs.zhanzhang.baidu.com/home.php?mod=space&uid=1');<br />
preg_match('@<h2 class="xs2">(.*?)<\/h2>@',$html,$match);<br />
//var_dump($match);<br />
echo strip_tags($match[0]);
Copy after login

理论上uid是自增的,前1000个就是1-1000,数字自己换吧。
------解决方案--------------------
$url = 'http://bbs.zhanzhang.baidu.com/home.php?mod=space&uid=';<br />
<br />
$res = array();<br />
$uid = 1;<br />
while(count($res) < 20) { //获取前20个,自己根据需要修改<br />
  if(preg_match('/<h2.+?h2>/s', file_get_contents($url.$uid), $r)){<br />
    if(preg_match_all('/\w+/', strip_tags($r[0]), $r) == 3) {<br />
      $res[$r[0][2]] = $r[0][0];<br />
    }<br />
  }<br />
  $uid++;<br />
}<br />
print_r($res);<br />
Copy after login
结果:(关联键为UID,值为用户名)

Array
(
    [1] => sitemapbbs
    [7] => _
    [8] => sitemapTest2
    [9] => sitemapTest
    [10] => sitemapTest32
    [13] => sitemapTest3
    [14] => kkksuper
    [16] => 05
    [17] => caoli456
    [18] => wangbin_ivan
    [19] => geiwosou
    [20] => sitemap_test1
    [21] => sitemap_test5
    [22] => _
    [23] => lkmmmmj
    [24] => blackfox1983
    [25] => dongbei_wb
    [26] => xyzlinger
    [27] => sanwushuosi
    [28] => 007
)

------解决方案--------------------
同上,不过file_get_contents方法有时候会太慢,可以采用curl.
Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template