asp.net regular expression to delete the code of the specified HTML tag

高洛峰
Release: 2017-02-03 15:14:11
Original
1409 people have browsed it

If you delete all the HTML tags inside, it may cause difficulty in reading (such as a, img tags), it is best to delete some and keep some.

In regular expressions, it is judged whether a certain These strings are very easy to understand, but how to judge whether they contain certain strings (a string, not a character, something, not a certain one) is indeed a puzzling thing.

<(?!((/?\s?li)|(/?\s?ul)|(/?\s?a)|(/?\s?img)|(/?\s?br)|(/?\s?span)|(/?\s?b)))[^>]+>
Copy after login

This regular rule is To determine that the HTML tags do not contain li / ul / a / img / br / span / b, as far as the above requirements are concerned, the HTML tags other than those listed here need to be deleted. This is what I figured out after a long time. .
(?!exp) matches a position that is not followed by exp
/?\s? I initially tried to write it after the front <, but the test failed.

The following is a simple function that strings together the TAGs to be retained, generates a regular expression, and then deletes the unnecessary TAGs...

private static string RemoveSpecifyHtml(string ctx) { 
string[] holdTags = { "a", "img", "br", "strong", "b", "span" };//要保留的 tag 
// <(?!((/?\s?li)|(/?\s?ul)|(/?\s?a)|(/?\s?img)|(/?\s?br)|(/?\s?span)|(/?\s?b)))[^>]+> 
string regStr = string.Format(@"<(?!((/?\s?{0})))[^>]+>", string.Join(@")|(/?\s?", holdTags)); 
Regex reg = new Regex(regStr, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase); 


return reg.Replace(ctx, ""); 
}
Copy after login

Correction:
The above regular expression, if After retaining li, you will find that link is also retained during actual operation. Retaining a will also retain addr. The solution is to add \b assertion.

<(?!((/?\s?li\b)|(/?\s?ul)|(/?\s?a\b)|(/?\s?img\b)|(/?\s?br\b)|(/?\s?span\b)|(/?\s?b\b)))[^>]+> 

private static string RemoveSpecifyHtml(string ctx) { 
string[] holdTags = { "a", "img", "br", "strong", "b", "span", "li" };//保留的 tag 
// <(?!((/?\s?li\b)|(/?\s?ul\b)|(/?\s?a\b)|(/?\s?img\b)|(/?\s?br\b)|(/?\s?span\b)|(/?\s?b\b)))[^>]+> 
string regStr = string.Format(@"<(?!((/?\s?{0})))[^>]+>", string.Join(@"\b)|(/?\s?", holdTags)); 
Regex reg = new Regex(regStr, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase); 

return reg.Replace(ctx, ""); 
}
Copy after login

More asp.net regular expression deletion specifications For articles related to the code of HTML tags, please pay attention to the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!