" should be "",I have some HTML that is messed up with spaces within tags and want to make it valid again - for example: I have some HTML that is messed up by spaces within tags and want to make it valid again - for example: should be converted to valid HTML, and when rendered, is expected to produce: Any text preceded/followed by spaces in I realize this may require several regular expressions, which is fine I have a few things: For example, I could take a drastic approach, but that would also break the code within the label text portion, not the label name itself< div class='test' >1 > 0 is < b >true b> and apples >>> bananas< / div >
>
or >< should remain unchanged - for example, ;1 > 0
should be retained instead of being compressed to 1>0
<\s?\/\s*
This will partially fix b> div >
to< code>
There is no reasonable way to save a document as corrupted as what you posted, but assuming you replace the
>
and similar characters in the text with their related entities, e.g.:> ;
, you can put the document you want to accept into an appropriate library, such asDomDocumentwhich will handle the rest.Output:
This regular expression is also valid:
It divides the valid part in the HTML tag into four parts and replaces the remaining parts (spaces) with them.
Regex101 Demo
/(]*\S)\s*(>)/g
( - Capture the opening angle bracket (section 1)
\s*
- matches any whitespace(\/?)
- Capturing optional backslashes (Part 2)\s*
- matches any space after a backslash([^]*\S)
- captures content within tags without trailing spaces (section 3)\s*
- Matches spaces after the content and before the closing angle bracket(>)
- Capture right angle bracket (section 4)