In database management systems like MySQL, comparing the similarity of text strings is a common requirement. This article explores a versatile approach to calculate the similarity percentage between two strings using MySQL functions.
The Levenshtein distance is a metric that measures the number of edits (insertions, deletions, or substitutions) required to transform one string into another. Higher similarity scores indicate closer resemblance between the strings.
In MySQL, the LEVENSHTEIN() function calculates the Levenshtein distance between two strings. To obtain the similarity percentage, we can use the following formula:
Similarity Percentage = (1 - (Levenshtein Distance / Length of Longest String)) * 100
To implement this approach in MySQL, create the following two functions:
LEVENSHTEIN() Function:
CREATE FUNCTION `LEVENSHTEIN`(s1 TEXT, s2 TEXT) RETURNS INT(11) DETERMINISTIC BEGIN # ... Function implementation ... END;
LEVENSHTEIN_RATIO() Function:
CREATE FUNCTION `LEVENSHTEIN_RATIO`(s1 TEXT, s2 TEXT) RETURNS INT(11) DETERMINISTIC BEGIN # ... Function implementation ... END;
Considering the example provided in the question:
SET @a = "Welcome to Stack Overflow"; SET @b = "Hello to stack overflow";
The query to calculate the similarity percentage between @a and @b would be:
SELECT LEVENSHTEIN_RATIO(@a, @b) AS SimilarityPercentage;
This query would return a value of 60, indicating a 60% similarity between the two strings.
The above is the detailed content of How can I Calculate String Similarity Percentage in MySQL using Levenshtein Distance?. For more information, please follow other related articles on the PHP Chinese website!