In fact, for Chinese, especially Chinese in this format, I don’t recommend using regular expressions, although it can be achieved with difficulty:
# coding: utf8 import re filename = '2.txt' patern = re.compile(r'^\d+ (\S+).*?(\S+)') with open(filename) as f: for i in f: result = patern.findall(i[:-1]) if result and len(result[0]) == 2: print result[0][0], result[0][1] # 输出: 男 北京 女 河北 男 山东
You can also use thesplitmethod (suggestion):
# coding: utf8 filename = '2.txt' with open(filename) as f: for i in f: result = i.split() print result[1], result[-1] # 输出: 男 北京 女 河北 男 山东
In fact, for Chinese, especially Chinese in this format, I don’t recommend using regular expressions, although it can be achieved with difficulty:
You can also use the
split
method (suggestion
):