Splitting Delimited Text with Embedded Quotes
When parsing text that follows a comma-delimited format, handling embedded quotes can pose a challenge. This article tackles this issue, providing a solution to split a string on commas while preserving the integrity of text enclosed within quotes.
Consider the following text:
123,test,444,"don't split, this",more test,1
Using the basic String.split(",") method would yield the following result:
123 test 444 "don't split this" more test 1
However, the goal is to retain the quoted text as a single entity:
123 test 444 "don't split, this" more test 1
To achieve this, we employ a regular expression-based solution:
str.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
This expression splits the string based on commas that are followed by an even number of double quotes. This ensures that commas within quoted text are ignored as delimiters.
Understanding the Regular Expression:
Alternative Syntax:
For readability, you can also break the regular expression into multiple lines using the (?x) modifier:
String[] arr = str.split("(?x) " + ", " + // Split on comma "(?= " + // Followed by " (?: " + // Start a non-capture group " [^\"]* " + // 0 or more non-quote characters " \" " + // 1 quote " [^\"]* " + // 0 or more non-quote characters " \" " + // 1 quote " )* " + // 0 or more repetition of non-capture group (multiple of 2 quotes will be even) " [^\"]* " + // Finally 0 or more non-quotes " $ " + // Till the end (This is necessary, else every comma will satisfy the condition) ") " // End look-ahead );
This approach ensures accurate splitting of delimited text, considering both commas and embedded quotes.
The above is the detailed content of How to Split Comma-Delimited Strings with Embedded Quotes?. For more information, please follow other related articles on the PHP Chinese website!