How to Calculate Cosine Similarity Between Sentence Strings in Python Without External Libraries?

Linda Hamilton
Release: 2024-10-31 14:30:02
Original
980 people have browsed it

How to Calculate Cosine Similarity Between Sentence Strings in Python Without External Libraries?

Calculating Cosine Similarity of Sentence Strings without External Libraries

To calculate the cosine similarity between two text strings without external modules, a simple Python implementation can be employed. The fundamental cosine similarity formula is utilized in this process:

cos(θ) = (A · B) / (||A|| · ||B||)
Copy after login

Where:

  • A and B are two vectors representing the sentences.
  • A · B is the dot product of vectors A and B.
  • ||A|| and ||B|| are the respective magnitudes of vectors A and B.

Implementation

The following Python code provides a practical implementation of this formula:

<code class="python">import math
import re
from collections import Counter

WORD = re.compile(r"\w+")

def get_cosine(vec1, vec2):
    intersection = set(vec1.keys()) & set(vec2.keys())
    numerator = sum([vec1[x] * vec2[x] for x in intersection])

    sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])
    sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])
    denominator = math.sqrt(sum1) * math.sqrt(sum2)

    if not denominator:
        return 0.0
    else:
        return float(numerator) / denominator


def text_to_vector(text):
    words = WORD.findall(text)
    return Counter(words)</code>
Copy after login

To use this code, convert the sentence strings into vectors using the text_to_vector function and then calculate the cosine similarity using the get_cosine function:

<code class="python">text1 = "This is a foo bar sentence ."
text2 = "This sentence is similar to a foo bar sentence ."

vector1 = text_to_vector(text1)
vector2 = text_to_vector(text2)

cosine = get_cosine(vector1, vector2)

print("Cosine:", cosine)</code>
Copy after login

This will output the cosine similarity between the two sentence strings. Note that tf-idf weighting is not included in this implementation, but can be added if a suitable corpus is available.

The above is the detailed content of How to Calculate Cosine Similarity Between Sentence Strings in Python Without External Libraries?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!