Regular expression - How to match Chinese Pinyin using Python?
ringa_lee
ringa_lee 2017-05-27 17:39:30
0
3
1636

For example, use regular expressions to match the pinyin of shá.
ps: What I said before may not be clear. I used the word "for example", which means that there is pinyin in the text to be processed, but I don't know what the specific pinyin is. You need to find out these pinyin, and the text to be processed will be in Chinese. , pinyin, symbols (,.: and the like), so please do not answer questions such asre.search(u'shá',text)It needs to be regular, not a simple fixed string. . .

ringa_lee
ringa_lee

ringa_lee

reply all (3)
巴扎黑
import re regex = re.compile(r'\b[a-z]*[āáǎàōóǒòêēéěèīíǐìūúǔùǖǘǚǜüńňǹɑɡ]+[a-z]*\b') text = "Thǐs ís à pìnyin abóut shá" m = regex.findall(text) print(m)

Matching results:
['ís', 'à', 'pìnyin', 'abóut', 'shá']
The first Thǐs is not matched because the default pinyin is all lowercase, excluding uppercase.

    PHPzhong

    Do you want to match all legal pinyin?

    If so, you can find the pinyin index of a dictionary and put all the pinyin|together. It can only be like this, because Pinyin is not defined according to regular rules or some other mechanical rules. This is all you can do if you don’t miss anything and don’t have too many, and there aren’t many anyway.

      伊谢尔伦
      >>> import re >>> d='shá' >>> data='This is a pinyin about shá' >>> re.search(d,data) <_sre.SRE_Match at 0x404e308>
        Latest Downloads
        More>
        Web Effects
        Website Source Code
        Website Materials
        Front End Template
        About us Disclaimer Sitemap
        php.cn:Public welfare online PHP training,Help PHP learners grow quickly!