Now I am segmenting words by title, each title has 3 words
I have created a separate tags table to store the divided words, with one record for each word. When reading related articles, I randomly read a tag, and then Searching for the same tag in the tags table used to be OK when there was little data. Now there are more than 100 million pieces of data in the tags table, and it is super slow to read.
The tags table only has 2 fields, an article ID and word segmentation, all of which are indexed. Then partition.
Is there any other way to write related articles?
Currently, 50,000 new data are added every day
相关性的衡量,应该有好几个维度:
1,文章所属板块,比如娱乐
2,文章中心思想或者主题是啥,要提取出来
3,时间和主要对象(人物、事件)相关
一篇文章可能有多个主体对象,可能跨板块进行关联