Use Python pyahocorasick to match keywords, which are probably between 10-20 Chinese characters.
The text used to construct ahocorasick is read from the local file key_word. The format is as follows:
Maternal and infant area<Complementary food<Noodles/noodles: infants, toddlers, babies, children, babies | Noodles, thin noodles, thick noodles, handmade noodles, vegetable noodles, nutritious noodles, broken noodles, dried noodles, noodles |
The matching result is empty.
code show as below:
import ahocorasick
A = ahocorasick.Automaton()
title = 'Hello Kitty3色蔬菜细面300克 婴儿幼儿营养面条宝宝辅食面条'
with open('key_word', 'r') as f:
for line in f.readlines():
line = line.strip()
line = str(line.split('<'))
A.add_word(line, line)
A.make_automaton()
aa = A.iter(title)
for item in aa:
print(item) # 打印为空值
If anyone has experienced this kind of problem, please help, provide sample code, or provide solutions, thank you!
After two days of research, I implemented this function myself
The local file has too many repeated keywords and the matching is not 100%. For reference
The reference code is as follows:
Print results: Mother and baby section<Complementary food<Noodles/Noodles