Community Learn Tools Library Leisure

English

Home > Web Front-end > JS Tutorial > Coding problem of nodejs crawler crawling data_node.js

Coding problem of nodejs crawler crawling data_node.js

WBOY

Release： 2016-05-16 15:51:39

Original

1478 people have browsed it

cheerio DOM化并解析的时候

1.假如使用了 .text()方法，则一般不会有html实体编码的问题出现

2.如果使用了 .html()方法，则很多情况下(多数是非英文的时候）都会出现，这时，可能就需要转义一番了

类似这些因为需要作数据存储，所有需要转换

复制代码代码如下:

Халк крушит. Новый способ исполнен

大多数都是(x)?\w+的格式

所以就用正则转换一番

var body = ....//这里就是请求后获得的返回数据，或者那些 .html()后获取的

//一般可以先转换为标准unicode格式（有需要就添加：当返回的数据呈现太多\\u 之类的时）
body=unescape(body.replace(/\\u/g,"%u"));
//再对实体符进行转义
//有x则表示是16进制，$1就是匹配是否有x ，$2就是匹配出的第二个括号捕获到的内容，将$2以对应进制表示转换
body = body.replace(/&#(x)&#63;(\w+);/g,function($,$1,$2){
        return String.fromCharCode(parseInt($2,$1&#63;16:10));
       });

Copy after login

ok ～

当然了，网上也有很多个转换的版本，适用的就行了

后记：

当使用爬虫抓取网页数据时，cheerio模块是经常使用到底，它像jq那样方便快捷

（但有些功能并未支持或者换了某种形式，比如 jq的 jQuery('.myClass').prop('outerHTML') ，cheerio则等价于 jQuery.html('.myClass')http://www.mgenware.com/blog/?p=2514）

Related labels：

nodejs reptile coding

source：php.cn

Previous article：Rendering component of javascript table_javascript skills Next article：Solve the problem of radio reselection implemented by jquery_jquery

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

What is a NullPointerException, and how do I fix it?

2024-10-22 09:46:29
From Novice to Coder: Your Journey Begins with C Fundamentals

2024-10-13 13:53:41
Unlocking Web Development with PHP: A Beginner's Guide

2024-10-12 12:15:51
Demystifying C: A Clear and Simple Path for New Programmers

2024-10-11 22:47:31
Unlock Your Coding Potential: C Programming for Absolute Beginners

2024-10-11 19:36:51
Unleash Your Inner Programmer: C for Absolute Beginners

2024-10-11 15:50:41
Automate Your Life with C: Scripts and Tools for Beginners

2024-10-11 15:07:41
PHP Made Easy: Your First Steps in Web Development

2024-10-11 14:21:21
Build Anything with Python: A Beginner's Guide to Unleashing Your Creativity

2024-10-11 12:59:11
The Key to Coding: Unlocking the Power of Python for Beginners

2024-10-11 12:17:31

Latest Issues

An error occurs when using NodeJS to connect to MySQL in Docker I have created a NodeJS backend server connected to MySQL. Using Docker, I created an imag...

From 2024-04-06 12:07:19

0

1

415

React/Node: Integration error encountered when subscribing using Trial_period_days I have a product defined in Stripe that I want to set up as a 7 day free trial and then ch...

From 2024-04-04 20:30:57

0

1

320

next not working properly in middleware using Nodejs I am working with Nodejs and using expressjs and now I am working on middleware functional...

From 2024-04-04 17:33:56

0

1

340

A simple Hello world program creates multiple node instances I'm very new to Javascript and NodeJS. I am running a simple helloworld program as follows...

From 2024-04-04 11:03:22

0

1

310

SQL query not showing up in specific route in Node.js I'm new to javascript and nodejs and I have a question I have a code where I'm working on ...

From 2024-04-04 00:24:22

0

1

344

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template