Home > Web Front-end > JS Tutorial > Implementation code for web scraping using phantomjs_javascript skills

Implementation code for web scraping using phantomjs_javascript skills

WBOY
Release: 2016-05-16 16:35:00
Original
1319 people have browsed it

Because phantomjs is a headless browser that can run js, it can also run dom nodes, which is perfect for web crawling.

For example, we want to batch crawl the content of "Today in History" on the web page. Website

Observing the dom structure, we only need to get the title value of .list li a. So we use advanced selectors to build DOM fragments

var d= ''
var c = document.querySelectorAll('.list li a')
var l = c.length;
for(var i =0;i<l;i++){
d=d+c[i].title+'\n'
}
Copy after login

After that, you only need to let the js code run in phantomjs~

var page = require('webpage').create();
	page.open('http://www.todayonhistory.com/', function (status) { //打开页面
		if (status !== 'success') {
			console.log('FAIL to load the address');
		} else {
			console.log(page.evaluate(function () {
					var d= ''
					var c = document.querySelectorAll('.list li a')
					var l = c.length;
					for(var i =0;i<l;i++){
					d=d+c[i].title+'\n'
					}
						return d
				}))

		}
		phantom.exit();
	});
Copy after login

Finally we save it as catch.js, execute it in dos, and output the content to a txt file (you can also use the file api of phantomjs to write)

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template