java - 求教：怎样解决Jsoup翻页问题？-PHP Chinese Network Q&A

java - 求教：怎样解决Jsoup翻页问题？

大家讲道理 2017-04-17 17:13:52

314

大家讲道理

光阴似箭催人老，日月如移越少年。

reply all (2)

洪涛2017-04-17 17:15:52 2 floor

JSoup helps you send http requests, obtain the returned HTML content, save it in the Document object, and then provides a set of jQuery-like APIs to query and parse the information in the HTML document

Each site has a specific URL request, or JSON or JSONP request for page turning. This needs to be organized and processed by yourself

You can use crawler libraries such as HttpClient to obtain the original HTML content, construct it into a JSOUP Document object, let JSOUP parse the content, and then save it to your desired persistence solution (local file, database, memory...)

Whether it is crawled or not, and whether it needs to be crawled through a proxy (how to reverse crawl) is not what JSOUP should do, just like HttpClient is responsible for crawling content, but it will not parse the content....