java - 求教:怎样解决Jsoup翻页问题?
大家讲道理
大家讲道理 2017-04-17 17:13:52
0
2
314
大家讲道理
大家讲道理

光阴似箭催人老,日月如移越少年。

reply all (2)
洪涛

JSoup helps you send http requests, obtain the returned HTML content, save it in the Document object, and then provides a set of jQuery-like APIs to query and parse the information in the HTML document

Each site has a specific URL request, or JSON or JSONP request for page turning. This needs to be organized and processed by yourself

You can use crawler libraries such as HttpClient to obtain the original HTML content, construct it into a JSOUP Document object, let JSOUP parse the content, and then save it to your desired persistence solution (local file, database, memory...)

Whether it is crawled or not, and whether it needs to be crawled through a proxy (how to reverse crawl) is not what JSOUP should do, just like HttpClient is responsible for crawling content, but it will not parse the content....

    迷茫

    Crawlers usually crawl a seed page first, which contains the rules for all page URLs, and then crawl other pages through this seed.

      Latest Downloads
      More>
      Web Effects
      Website Source Code
      Website Materials
      Front End Template
      About us Disclaimer Sitemap
      php.cn:Public welfare online PHP training,Help PHP learners grow quickly!