I wrote a small python program of about 70 lines to calculate the similarity of documents.
The material is 88 paper documents, using the gensim package.
The process of the program is to preprocess the document (delete unnecessary symbols, word segmentation, etc.), calculate the tfidf value of the document, and establish the tfidf model and model index of 88 papers. Up to this point, the program is running normally, but when using the index, an error is reported:
What is the cause of this? Thank you~
The following is part of the source code that runs without problems:
texts = [[word for word in document.split()]for document in documents]
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
tfidf_model = models.TfidfModel(corpus)
tfidfs = tfidf_model[corpus]
index = similarities.SparseMatrixSimilarity(tfidfs,num_features=88075)
A problem occurred while running this code:
content = 'A student of music needs as long and as arduous a training to become a performer as a medical student needs to become a doctor'
content = content.lower().split()
test = dictionary.doc2bow(content)
test_tfidf = tfidf_model[test]
sims = index[test_tfidf]#**就是这一句出现了问题!**
What is your python version? Currently
, it is difficult to install both of these on win. Even if installed, there may not be any problemsThis error may also be caused by the Windows operating system. If you copy the code to Google, you will find many solutions, such as this one:
How to fix 0xc0000417 Error?
http://www.wiki-errors.com/do... Just download and install it. Return to Baidu to ensure your safety.
Pirated version of the operating system?
Switch to Linux.