Let me teach you how to use nodejs to crawl website images through an example. Friends who are interested can save it.
I will explain to you how nodejs implements the function of crawling website images through examples. The following is the full content:
Principle:
Crawler is the most obvious IO For intensive application scenarios, it is obvious to use node, which makes data mining with small I/O waiting overhead more convenient.
Use the express module to build the node service
And use the request module to obtain the html code of the target page
Download the cheerio module to process html code (cheerio has a syntax similar to jQuery, so it is easy to use and convenient)
Environment configuration:
npm install express request cheerio --save
(1)Introduce each module
var http = require('http'); var request = require('request); var cheerio = require('cheerio'); var fs = require('fs'); //用来操作文件 var url = 'https://movie.douban.com/cinema/nowplaying/beijing/' //定义要爬的页面
(2)Send a request
http.get(function(res){ var html = ''; var titles = []; res.setEncoding('utf-8') //防止中文乱码 res.on('data',function(chunk){ html += chrunk; //监听data事件 每次取一块数据 }) res.on('end',function(){ var $ = cheerio.load(html); //获取数据完成后,解析html //将获取的图片存到images文件夹中 $('.mod-bd img').each(function(index, item){ //获取图片属性 var imgName = $(this).parent().next().text().trimg() var imgfile = imgName + '.jpeg'; var imgSrc = $(this).attr('src') //采用request模块,向服务器发起请求 获取图片资源 request.head(imgSrc, function(error, res,body){ if(error){ console.log('失败了') } }); //通过管道的方式用fs模块将图片写到本地的images文件下 request(imgSrc).pipe.(fs.createWriteStream('./images/' + imgfile)); }) }) })
The above is what I compiled for everyone, I hope it will be helpful to everyone in the future .
Related articles:
How to implement reassignment using js
How to save the image generated by canvas in js
How to implement two-way binding in js
Details introduction to the more practical functions of webpack
How to implement a menu using jQuery Add removal function
How to configure ueditor using nodejs mongodb vue
The above is the detailed content of How to crawl website images in nodejs. For more information, please follow other related articles on the PHP Chinese website!