How can I make this regex simpler?
P粉710454910
P粉710454910 2024-02-26 18:49:18
0
2
402

I have this regular expression:

"(WORD1.*WORD2.*WORD3)|(WORD1.*WORD3.*WORD2)|(WORD2.*WORD1.*WORD3)|(WORD2.*WORD3.*WORD1)|(WORD3.*WORD1. *WORD2)|(WORD3.*WORD2.*WORD1)"

It matches these words:

WORD1WORD2WORD3
WORD1AWORD2BWORD3C
WORD3WORD1WORD2
WORD1WORD2WORD3WORD1

But not these words:

WORD1WORD1WORD2
WORD1AWORD1BWORD2C

This regex matches when it finds a string containing 3 words in any order (WORD1, WORD2, WORD3) .

I want to do the same thing with more words, but the problem is that the size of the regex grows exponentially with the number of words. Is it possible to simplify the way this regex is constructed to solve this problem (without growing exponentially in size)?

P粉710454910
P粉710454910

reply all(2)
P粉663883862

Simply iterate over all strings and filter out all strings that do not contain all keywords:

(A more concise version can be found in the code snippet below)

function findMatch(strings, keywords) {
  const result = [];
  
  for (const string of strings) {
    if (keywords.every(keyword => string.includes(keyword))) {
      result.push(string);
    }
  }
  
  return result;
}

try it:

console.config({ maximize: true });

function findMatch(strings, keywords) {
  return strings.filter(
    string => keywords.every(keyword => string.includes(keyword))
  );
}

const testcases = [
  'WORD1WORD2WORD3',
  'WORD1AWORD2BWORD3C',
  'WORD3WORD1WORD2',
  'WORD1WORD2WORD3WORD1',
  'WORD1WORD1WORD2',
  'WORD1AWORD1BWORD2C'
];

const keywords = [
  'WORD1', 'WORD2', 'WORD3'
];

console.log(findMatch(testcases, keywords));
P粉998100648

You can use positive lookahead for each word.

/(?=.*WORD1)(?=.*WORD2)(?=.*WORD3).*/

A more performant version below specifies the starting anchor and only matches a single character after validating the lookahead. As requested by the OP, this technique only works with matching, not extraction.

/^(?=.*WORD1)(?=.*WORD2)(?=.*WORD3)./

Forward lookahead is like a gate, it will only continue if the match specified within the brackets exists, but it will not consume or capture what it matches - it is always zero length. If you "look ahead" to see if there is .* before each word, the order of the words doesn't matter. If each word is true, proceed without using anything for matching. p>

If you only care about whether the content matches, the only substantial difference between the two expressions is the time they take. Let’s say you only have 2 of the 3 required words in your content. Unless the software interpreting the expression can recognize that the attempt is futile, it might look for the three words "failed" in the first position, then try "failed" in the second position, and so on until it reaches the last position. give up. By specifying ^, only the first position will be checked, saving time on other unnecessary checks. Removing the * from the end can prevent some unnecessary catches when you are just looking for the true/false answer of whether all words are present in the content.

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template