Heim > Backend-Entwicklung > PHP-Tutorial > php 如何精准获取网站中的所有超链接?

php 如何精准获取网站中的所有超链接?

WBOY
Freigeben: 2016-06-06 20:22:03
Original
2061 Leute haben es durchsucht

想获取网站中的所有超链接,使用的是php snoopy类

<code>$sourceURL = $url;
$snoopy->fetchlinks($sourceURL);
$content = $snoopy->results;</code>
Nach dem Login kopieren
Nach dem Login kopieren

获取的结果如下:

<code>array (size=627)
  0 => string 'http://www.alibaba.com/https://login.alibaba.com/' (length=49)
  1 => string 'http://sh.vip.alibaba.com?tracelog=nav_ma' (length=41)
  2 => string 'http://message.alibaba.com/feedback/default.htm?routeto=inbox&tracelog=nav_ma_mc' (length=80)
  3 => string 'http://www.alibaba.com//hz-favorite.alibaba.com/favorite/favorite_home.htm?tracelog=nav_ma_fav' (length=94)
  4 => string 'http://rfq.alibaba.com/form.htm?tracelog=header_myalibaba' (length=57)
  5 => string 'http://hz.sourcing.alibaba.com/rfq/request/rfq_manage_list.htm?tracelog=nav_ma_mana_rfq' (length=87)
  6 => string 'http://biz.alibaba.com/generalorders/list_orders.htm?tracelog=ma_mana_orders' (length=76)
  7 => string 'http://sh.vip.alibaba.com/product/post_product_interface.htm?tracelog=newschp_nav_madp' (length=86)
  8 => string 'http://sh.vip.alibaba.com/product/manage_products.htm?tracelog=newschp_nav_mamng' (length=80)
  9 => string 'http://hz.sourcing.alibaba.com/rfq/quotation/rfq_not_quoted_manage_list.htm?nav_ma_rec_rfqs' (length=91)
  10 => string 'http://www.alibaba.com/javascript:;' (length=35)
  11 => string 'http://www.alibaba.com/Products?tracelog=beacon_cate_140704' (length=59)
  12 => string 'http://rfq.alibaba.com/form.htm?tracelog=header_forbuyers' (length=57)
  13 => string 'http://globalexpo.alibaba.com?tracelog=beacon_expo_150820' (length=57)
  14 => string 'http://wholesale.alibaba.com?tracelog=nav_ws' (length=44)
  15 => string 'http://buyer.alibaba.com/bizid_buyer?tracelog=nav_bi' (length=52)
  16 => string 'http://tradeassurance.alibaba.com/bao/buyer_advertise.htm?tracelog=from_home_menu' (length=81)
  17 => string 'http://activities.alibaba.com/alibaba/secure-payment.php?tracelog=beacon_payment_150114' (length=87)
  18 => string 'http://ecredit.alibaba.com/ecl/buyer.htm?tracelog=beacon_credit_140704' (length=70)
  19 => string 'http://inspection.alibaba.com/?tracelog=beacon_is_140704' (length=56)
  20 => string 'http://buyer.alibaba.com/intelligence?tracelog=beacon_ti_140704' (length=63)
  21 => string 'http://buyer.alibaba.com/forum?tracelog=beacon_df_140704' (length=56)
  22 => string 'http://ask.alibaba.com/?tracelog=beacon_ta_140704' (length=49)
  23 => string 'http://www.alibaba.com/javascript:;' (length=35)
  24 => string 'http://seller.alibaba.com/memberships/index.html?tracelog=seller_channel_member_hp_header' (length=89)
  25 => string 'http://seller.alibaba.com/learningcenter?tracelog=seller_channel_lc_hp_header' (length=77)
  26 => string 'http://seller.alibaba.com/training.htm?tracelog=seller_channel_training_hp_header' (length=81)
  27 => string 'http://sourcing.alibaba.com/?tracelog=newschp_nav_narfq' (length=55)
  28 => string 'http://www.alibaba.com/javascript:;' (length=35)</code>
Nach dem Login kopieren
Nach dem Login kopieren

怎么能把“http://www.alibaba.com/javascript:;”类似的URL去掉?

回复内容:

想获取网站中的所有超链接,使用的是php snoopy类

<code>$sourceURL = $url;
$snoopy->fetchlinks($sourceURL);
$content = $snoopy->results;</code>
Nach dem Login kopieren
Nach dem Login kopieren

获取的结果如下:

<code>array (size=627)
  0 => string 'http://www.alibaba.com/https://login.alibaba.com/' (length=49)
  1 => string 'http://sh.vip.alibaba.com?tracelog=nav_ma' (length=41)
  2 => string 'http://message.alibaba.com/feedback/default.htm?routeto=inbox&tracelog=nav_ma_mc' (length=80)
  3 => string 'http://www.alibaba.com//hz-favorite.alibaba.com/favorite/favorite_home.htm?tracelog=nav_ma_fav' (length=94)
  4 => string 'http://rfq.alibaba.com/form.htm?tracelog=header_myalibaba' (length=57)
  5 => string 'http://hz.sourcing.alibaba.com/rfq/request/rfq_manage_list.htm?tracelog=nav_ma_mana_rfq' (length=87)
  6 => string 'http://biz.alibaba.com/generalorders/list_orders.htm?tracelog=ma_mana_orders' (length=76)
  7 => string 'http://sh.vip.alibaba.com/product/post_product_interface.htm?tracelog=newschp_nav_madp' (length=86)
  8 => string 'http://sh.vip.alibaba.com/product/manage_products.htm?tracelog=newschp_nav_mamng' (length=80)
  9 => string 'http://hz.sourcing.alibaba.com/rfq/quotation/rfq_not_quoted_manage_list.htm?nav_ma_rec_rfqs' (length=91)
  10 => string 'http://www.alibaba.com/javascript:;' (length=35)
  11 => string 'http://www.alibaba.com/Products?tracelog=beacon_cate_140704' (length=59)
  12 => string 'http://rfq.alibaba.com/form.htm?tracelog=header_forbuyers' (length=57)
  13 => string 'http://globalexpo.alibaba.com?tracelog=beacon_expo_150820' (length=57)
  14 => string 'http://wholesale.alibaba.com?tracelog=nav_ws' (length=44)
  15 => string 'http://buyer.alibaba.com/bizid_buyer?tracelog=nav_bi' (length=52)
  16 => string 'http://tradeassurance.alibaba.com/bao/buyer_advertise.htm?tracelog=from_home_menu' (length=81)
  17 => string 'http://activities.alibaba.com/alibaba/secure-payment.php?tracelog=beacon_payment_150114' (length=87)
  18 => string 'http://ecredit.alibaba.com/ecl/buyer.htm?tracelog=beacon_credit_140704' (length=70)
  19 => string 'http://inspection.alibaba.com/?tracelog=beacon_is_140704' (length=56)
  20 => string 'http://buyer.alibaba.com/intelligence?tracelog=beacon_ti_140704' (length=63)
  21 => string 'http://buyer.alibaba.com/forum?tracelog=beacon_df_140704' (length=56)
  22 => string 'http://ask.alibaba.com/?tracelog=beacon_ta_140704' (length=49)
  23 => string 'http://www.alibaba.com/javascript:;' (length=35)
  24 => string 'http://seller.alibaba.com/memberships/index.html?tracelog=seller_channel_member_hp_header' (length=89)
  25 => string 'http://seller.alibaba.com/learningcenter?tracelog=seller_channel_lc_hp_header' (length=77)
  26 => string 'http://seller.alibaba.com/training.htm?tracelog=seller_channel_training_hp_header' (length=81)
  27 => string 'http://sourcing.alibaba.com/?tracelog=newschp_nav_narfq' (length=55)
  28 => string 'http://www.alibaba.com/javascript:;' (length=35)</code>
Nach dem Login kopieren
Nach dem Login kopieren

怎么能把“http://www.alibaba.com/javascript:;”类似的URL去掉?

QueryList

<code class="php"><?php //采集某页面所有的图片
$data = QueryList::Query('http://cms.querylist.cc/bizhi/453.html',['image' => ['img','src']])->data;
//打印结果
print_r($data);

//采集某页面所有的超链接
$data = QueryList::Query('http://cms.querylist.cc/google/list_1.html',['link' => ['a','href']])->data;
//打印结果
print_r($data);</code>
Nach dem Login kopieren

http://git.oschina.net/jae/QueryList
可以看下这个,比snoopy要强大一些,支持jquery选择器语法

Verwandte Etiketten:
php
Quelle:php.cn
Erklärung dieser Website
Der Inhalt dieses Artikels wird freiwillig von Internetnutzern beigesteuert und das Urheberrecht liegt beim ursprünglichen Autor. Diese Website übernimmt keine entsprechende rechtliche Verantwortung. Wenn Sie Inhalte finden, bei denen der Verdacht eines Plagiats oder einer Rechtsverletzung besteht, wenden Sie sich bitte an admin@php.cn
Beliebte Tutorials
Mehr>
Neueste Downloads
Mehr>
Web-Effekte
Quellcode der Website
Website-Materialien
Frontend-Vorlage