Home>Article>Web Front-end> Practical sharing: Use nodejs to crawl and download more than 10,000 images

Practical sharing: Use nodejs to crawl and download more than 10,000 images

青灯夜游 forward: 2022-03-24 19:49:28 4705browse

This article will share with you anodepractical experience to see how the author used nodejs to crawl more than 10,000 little sister wallpapers. I hope it will be helpful to everyone!

Hello, everyone, I am Xiaoma, why do I need to download so many pictures? A few days ago, I used uni-app uniCloud to deploy a wallpaper applet for free. Then I need some resources to fill the applet with content.

Crawling pictures

First initialize the project and installaxiosandcheerio

npm init -y && npm i axios cheerio

axiosUsed to crawl web page content,cheeriois the jquery api on the server side, we use it to obtain the image address in the dom;

const axios = require('axios') const cheerio = require('cheerio') function getImageUrl(target_url, containerEelment) { let result_list = [] const res = await axios.get(target_url) const html = res.data const $ = cheerio.load(html) const result_list = [] $(containerEelment).each((element) => { result_list.push($(element).find('img').attr('src')) }) return result_list }

allows us to obtain the image URL in the page. Next, you need to download the image according to the url.

How to use nodejs to download files

Method 1: Use the built-in modules 'https' and 'fs'

UsenodejsDownloading files can be done using built-in packages or third-party libraries.

The GET method is used with HTTPS to get the file to download.createWriteStream()is a method used to create a writable stream. It only receives one parameter, which is the location where the file is saved.Pipe()is a method that reads data from a readable stream and writes it to a writable stream.

const fs = require('fs') const https = require('https') // URL of the image const url = 'GFG.jpeg' https.get(url, (res) => { // Image will be stored at this path const path = `${__dirname}/files/img.jpeg` const filePath = fs.createWriteStream(path) res.pipe(filePath) filePath.on('finish', () => { filePath.close() console.log('Download Completed') }) })

Method 2: DownloadHelper

npm install node-downloader-helper

The following is the code to download images from the website. An object dl is created by the class DownloadHelper, which receives two parameters:

The image to be downloaded.
The path where the image must be saved after downloading.

The File variable contains the URL of the image that will be downloaded, and the filePath variable contains the path to the file that will be saved.

const { DownloaderHelper } = require('node-downloader-helper') // URL of the image const file = 'GFG.jpeg' // Path at which image will be downloaded const filePath = `${__dirname}/files` const dl = new DownloaderHelper(file, filePath) dl.on('end', () => console.log('Download Completed')) dl.start()

Method 3: Use download

is written by npm mastersindresorhus, very easy to use

npm install download

The following is the code to download images from the website. The download function receives a file and file path.

const download = require('download') // Url of the image const file = 'GFG.jpeg' // Path at which image will get downloaded const filePath = `${__dirname}/files` download(file, filePath).then(() => { console.log('Download Completed') })

Final code

I originally wanted to crawl Baidu wallpapers, but the resolution was not enough, and there were watermarks, etc. Later, a friend in the group found an API, which I guess. For high-definition wallpapers on a certain mobile app, you can directly get the download URL, so I used it directly.

The following is the complete code

const download = require('download') const axios = require('axios') let headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36', } function sleep(time) { return new Promise((reslove) => setTimeout(reslove, time)) } async function load(skip = 0) { const data = await axios .get( 'http://service.picasso.adesk.com/v1/vertical/category/4e4d610cdf714d2966000000/vertical', { headers, params: { limit: 30, // 每页固定返回30条 skip: skip, first: 0, order: 'hot', }, } ) .then((res) => { return res.data.res.vertical }) .catch((err) => { console.log(err) }) await downloadFile(data) await sleep(3000) if (skip < 1000) { load(skip + 30) } else { console.log('下载完成') } } async function downloadFile(data) { for (let index = 0; index < data.length; index++) { const item = data[index] // Path at which image will get downloaded const filePath = `${__dirname}/美女` await download(item.wp, filePath, { filename: item.id + '.jpeg', headers, }).then(() => { console.log(`Download ${item.id} Completed`) return }) } } load()

In the above code, you must first setUser-Agentand set a 3s delay. This can prevent the server from blocking the crawler and directly return 403.

Directlynode index.jswill automatically download the image.

Practical sharing: Use nodejs to crawl and download more than 10,000 images 、

experience

WeChat applet search "水瓜图" experience.

https://p6-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/c5301b8b97094e92bfae240d7eb1ec5e~tplv-k3u1fbpfcp-zoom-1.awebp?

More nodes For related knowledge, please visit:nodejs tutorial!

The above is the detailed content of Practical sharing: Use nodejs to crawl and download more than 10,000 images. For more information, please follow other related articles on the PHP Chinese website!

jquery npm JS 对象 dom 微信小程序 uni-app https zoom

Statement：

This article is reproduced at:juejin.cn. If there is any infringement, please contact admin@php.cn delete

Previous article：A quick summary of JavaScript learning knowledge points Next article：A quick summary of JavaScript learning knowledge points

See more

Practical sharing: Use nodejs to crawl and download more than 10,000 images

Crawling pictures

How to use nodejs to download files

Final code

experience

Related articles