Home > Java > body text

Find similar text using regular expressions

王林
Release: 2024-02-14 19:03:08
forward
673 people have browsed it

php editor Youzi regular expression is a powerful text matching tool that can help us quickly find similar text. Whether in string processing, data extraction or validating input, regular expressions play an important role. Its flexibility and efficiency enable us to handle complex text operations more conveniently, greatly improving development efficiency. Whether you are a beginner or an experienced developer, mastering regular expressions is an essential skill. Let's explore its charm together!

Question content

I identified text lists in different pdf documents. Now I need to extract some values ​​from each text using regular expressions. Some of my patterns look like this:

some text[ -]?(.+)[ ,-]+some other text
Copy after login

But the problem is that some letters may be wrong after recognition ("0" replaces "o", "i" replaces "l " wait). That's why my pattern doesn't match it.

I want to use a regular expression like jaro-winkler or levenshtein similarity so that I can extract my_value from s0me text my_value, some other text etc.

I know this looks awesome. But maybe there is a solution to this problem.

btw I'm using java but solutions in other languages ​​are acceptable

Workaround

If used in pythonregex module, you can use fuzzy matching. The following regular expression allows up to 2 errors per phrase. You can use more complex error tests (for insertions, substitutions and deletions), see the linked documentation for details.

import regex

txt = 's0me text my_value, some otner text'
pattern = regex.compile(r'(?:some text){e<=2}[ -]?(.+?)[ ,-]+(?:some other text){e<=2}')

m = pattern.search(txt)
if m is not none:
    print(m.group(1))
Copy after login

Output:

my_value
Copy after login
rrree

Regular expression pattern(?i)(some\s*\w*\s*text\s*)([^,] ) Used to capture phrases similar to "some text" , followed by any character

before the comma

The above is the detailed content of Find similar text using regular expressions. For more information, please follow other related articles on the PHP Chinese website!

source:stackoverflow.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!