There are multiple JS libraries that allow export to excel. But what if we want to follow a minimalist approach avoiding extra dependencies.
The simplest approach would be to produce CSV from JSON that can easily be opened in excel.
But before I show the conversion logic, let's understand what CSV is and which encoding we shall use when creating a CSV file.
CSV format
This RFC 4180 Common Format and MIME Type for Comma-Separated Values (CSV) Files specifies definition of the CSV format. Note that this is a memo only as the CSV format is not officially standardized.
Main definitions
- Each record is located on a separate line, delimited by a line break (CRLF).
- The last record in the file may or may not have an ending line break.
- Header should contain the same number of fields throughout the file.
- Each field may or may not be enclosed in double quotes.
- Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
- If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
Encoding
In my JSON data I have characters æ å ø from ISO Latin-1 (ISO/IEC 8859-1) character set that have to be considered when creating a csv file.
Some excerpts The Unicode® Standard Version 15.0 to consider.
Unicode vs UTF-8/16/32
From The Unicode® Standard Version 15.0:
- Unicode is the universal character encoding standard for written characters and text, containing 149,186 characters from the world’s scripts.
- Unicode characters are represented in one of three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8).
- The Unicode Consortium fully endorses the use of any of the three Unicode encoding forms as a conformant way of implementing the Unicode Standard. It is important not to fall into the trap of trying to distinguish “UTF-8 versus Unicode,” for example. UTF-8, UTF-16, and UTF-32 are all equally valid and conformant ways of implementing the encoded characters of the Unicode Standard.
Byte Order Mark (BOM)
- The character U+FEFF (UTF-8 EF BB BF) used for the byte order mark is named zero width no-break space.
- The UTF-16 and UTF-32 encoding forms of Unicode plain text are sensitive to the byte ordering that is used when writing data to a file.
- Identification of the byte sequence at the beginning of a
data stream can be taken as a near-certain indication that the data stream is using the UTF-8 encoding scheme.
In short, adding zero width no-break space before the CSV string will enforce Excel to apply UTF-8 encoding instead of 1252: Western European (Windows) or some other encoding which Excel will choose in case the U+FEFF character is not provided.
I will show the difference between producing CSV file with zero width no-break space and without it in the next post of this series...
The above is the detailed content of Export JSON to CSV: a note on CSV and Unicode. For more information, please follow other related articles on the PHP Chinese website!