python - How to regularize all Chinese characters in a string
欧阳克
欧阳克 2017-06-22 11:51:45
0
2
992
这样算吗?121238asdf

The string is as above, the type is 'str', and Chinese characters must be obtained by regularity. When I used [u4e00-u9fa5] before, I still got a list of symbols and numbers in English. Please teach me the correct posture. Also, tell me where I made a mistake...

pattern = re.compile(r'[\u4E00-\u9FA5]') print pattern.findall(x[1])

This is what I wrote...but the returned result does not have Chinese characters, but other characters except Chinese characters.

欧阳克
欧阳克

温故而知新,可以为师矣。 博客:www.ouyangke.com

reply all (2)
習慣沉默

I assume here that the text you need to match iss:

pattern = re.compile(ur"[\u4e00-\u9fa5]") print pattern.findall(s.decode('utf8'))

Thedecode('utf8')here is because the value of s is a Unicode hash likex66x77x88. In addition, you need to pay attention to theurmodifier incompile(), anduis the Unicode modifier.

PS: I was inspired by this article.

Update

I just read what was said downstairs. It is true that with Python 3, the output is Unicode hash. The following is excerpted from here

Unicode string

In Python2, ordinary strings are stored as 8-bit ASCII codes, while Unicode strings are stored as 16-bit unicode strings, which can represent more character sets. The syntax used is to prefix the string with u.

In Python3, all strings are Unicode strings.

    女神的闺蜜爱上我

    You are using python2,uxxxxis a unicode character, and what you get after matching is abytestring, which prints out each byte value.

    Change to python3This problem will disappear

      Latest Downloads
      More>
      Web Effects
      Website Source Code
      Website Materials
      Front End Template
      About us Disclaimer Sitemap
      php.cn:Public welfare online PHP training,Help PHP learners grow quickly!