python - How to automatically escape '<abc>' when encountering such html escape characters under python3?
typecho
typecho 2017-06-12 09:27:01
0
1
1029

I am new to python. When using the scray crawler, I encountered the special characters of html, so I searched the documentation on Baidu:

import HTMLParser
html_parser = HTMLParser.HTMLParser()
s = '&l t;abc&g t;&nbs p;' #Leave a space to avoid web page escaping
s = html_parser.unescape(s )

Runtime prompt:
import markupbase
ImportError: No module named 'markupbase'


With the help of translation software, I looked at the official documentation of HTMLParser to find the second method

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):

def handle_data(self, data):
    print(data)
    return data

parser = MyHTMLParser()
s = '&l t;abc&g t;&nbs p;' #A space is left to avoid web page escaping
ss=parser.feed(s)

The second method was tested successfully. The problem encountered is that the return data sentence is invalid?


Excuse me, is there any way to solve the escape problem with just a few lines of code? If there is no second method, how can I get a return value?

typecho
typecho

Following the voice in heart.

reply all(1)
某草草
from html.parser import HTMLParser
html_parser = HTMLParser()
s = '<abc>&nbsp;'
txt = html_parser.unescape(s)
print(txt)
# 结果:<abc>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template