This depends on the size of your data. If the data size is small, at most you can save the keywords in redis or some configuration file. Every time you crawl down the data, take out all the keywords and replace them.
But because you are a web crawler, if the keywords and the strings that need to be filtered are particularly large, even if you use regular expressions, the efficiency will be very worrying.
For example, you have 100,000 keywords that need to be filtered out. Suppose you can combine these 100,000 keywords into 50,000 regular expressions (not to mention whether to manually write so many regular expressions or automatically generate regular expressions), each time The string that climbed down is very long, and it needs to be looped at least 50,000 times to match all regular expressions. I think this simple method may not be available.
Just my personal suggestion, you can refer to this article: http://blog.jobbole.com/99910/ It talks about how to segment keywords and build keyword indexes to achieve more efficient queries. This article introduces stackoverflow's tag engine.
Or suggest using heavyweight ones like ElasticSearch. . . Obviously there is no way to say the dozens of words here.
This depends on the size of your data. If the data size is small, at most you can save the keywords in redis or some configuration file. Every time you crawl down the data, take out all the keywords and replace them.
But because you are a web crawler, if the keywords and the strings that need to be filtered are particularly large, even if you use regular expressions, the efficiency will be very worrying.
For example, you have 100,000 keywords that need to be filtered out. Suppose you can combine these 100,000 keywords into 50,000 regular expressions (not to mention whether to manually write so many regular expressions or automatically generate regular expressions), each time The string that climbed down is very long, and it needs to be looped at least 50,000 times to match all regular expressions. I think this simple method may not be available.
Just my personal suggestion, you can refer to this article: http://blog.jobbole.com/99910/ It talks about how to segment keywords and build keyword indexes to achieve more efficient queries. This article introduces stackoverflow's tag engine.
Or suggest using heavyweight ones like ElasticSearch. . . Obviously there is no way to say the dozens of words here.
What the person above said is correct, but if the data is small, you can consider using any