Maintaining Delimiters during String Splitting
When working with a multiline string separated by a set of delimiters, a common task is to split it into individual parts. Typically, the String.split method is employed for this purpose. However, this approach omits the delimiters from the result.
Using Lookahead and Lookbehind
To retain the delimiters during splitting, the concept of lookahead and lookbehind in regular expressions can be utilized.
Consider the following code:
<code class="java">System.out.println(Arrays.toString("a;b;c;d".split("(?<=;)"))); System.out.println(Arrays.toString("a;b;c;d".split("(?=;)"))); System.out.println(Arrays.toString("a;b;c;d".split("((?<=;)|(?=;))")));</code>
This code uses lookahead and lookbehind to examine the string. Specifically, (?<=;) matches an empty string that immediately precedes a semicolon (;), and (?=;) matches an empty string that immediately follows a semicolon.
The final line, which splits on ((?<=;)|(?=;)), efficiently separates the string into both text and delimiter components:
[a;, b;, c;, d] [a, ;b, ;c, ;d] [a, ;, b, ;, c, ;, d]
In the desired result, the third line captures the exact requirement, where each text segment is followed by its corresponding delimiter.
Enhancing Readability with Variables
To improve the readability of regular expressions, consider creating a variable that represents its function. For instance, the following code uses the WITH_DELIMITER variable to specify the delimiter:
<code class="java">static public final String WITH_DELIMITER = "((?<=%1$s)|(?=%1$s))"; public void someMethod() { final String[] aEach = "a;b;c;d".split(String.format(WITH_DELIMITER, ";")); ... }</code>
This technique helps clarify the intent of the regular expression.
The above is the detailed content of Can Delimiters be Retained during String Splitting?. For more information, please follow other related articles on the PHP Chinese website!