Home >Common Problem >What is the difference between text files and binary files?
The difference between text files and binary files: 1. Text files are files based on character encoding. Common encodings include ASCII encoding, UNICODE encoding, etc.; 2. Binary files are files based on value encoding
The difference between text files and binary files:
1. The definition of text files and binary files
Everyone knows Computer storage is physically binary, so the difference between text files and binary files is not physical, but logical. The two only differ at the coding level.
Simply put, text files are files based on character encoding. Common encodings include ASCII encoding, UNICODE encoding, etc. Binary files are files based on value encoding. You can specify what a certain value means according to the specific application (such a process can be regarded as custom encoding).
It can be seen from the above that text files are basically fixed-length encoding. Based on characters, each character is fixed in the specific encoding. ASCII code is an 8-bit encoding, and UNICODE generally accounts for 16 bits. bits. Binary files can be regarded as variable-length encoding, because it is value encoding. How many bits represent a value is entirely up to you. You may be familiar with BMP files. Let’s take it as an example. Its header is relatively fixed-length file header information. The first 2 bytes are used to record that the file is in BMP format, and the next 8 bytes are used to record the file. length, and the next 4 bytes are used to record the length of the bmp file header. . . As you can see, the encoding is based on values (variable lengths, including values of 2, 4, and 8 bytes long), so BMP is a binary file.
2. Access to text files and binary files
What is the process of opening a file with a text tool? Take Notepad as an example. It first reads the binary bit stream that physically corresponds to the file (as mentioned earlier, storage is binary), then interprets this stream according to the decoding method you choose, and then displays the interpretation results. . Generally speaking, the decoding method you choose will be in ASCII code form (one character of ASCII code is 8 bits). Next, it interprets this file stream 8 bits 8 bits. For example, for such a file stream "01000000_01000001_01000010_01000011" (underscore '_', which I added manually to enhance readability), if the first 8 bits '01000000' is decoded according to ASCII code, the corresponding character is 'A', similarly the other three 8-bits can be decoded as 'BCD' respectively, that is, this file stream can be interpreted as "ABCD", and then Notepad will display this "ABCD" on the screen.
In fact, if anything in the world wants to communicate with other things, there is an established protocol and established encoding. People communicate with each other through words. The Chinese character "mother" represents the person who gave birth to you. This is an established code. But I noticed that the Chinese character "Mom" in Japanese characters may mean the person you gave birth to. Therefore, when a Chinese person A and a Japanese person B use the word "mother" to communicate, it is very easy for misunderstandings to occur. normal. Opening binary files with Notepad is similar to the situation above. No matter what file it opens, Notepad works according to the established character encoding (such as ASCII code), so when it opens a binary file, it is inevitable that garbled characters will appear. Decoding and decoding do not correspond. For example, the file stream '00000000_00000000_00000000_00000001' may correspond to a four-byte integer int1 in the binary file. When interpreted in Notepad, it becomes the four control characters "NULL_NULL_NULL_SOH".
The storage and reading of text files are basically a reverse process, which will not be described again. The access of binary files is obviously similar to the access of text files, except that the encoding/decoding methods are different, which will not be described again.
3. Advantages and Disadvantages of Text Files and Binary Files
Because the difference between text files and binary files is only in encoding, their advantages and disadvantages are in encoding The advantages and disadvantages will be clearer if you look for a coding book. It is generally believed that text file encoding is based on fixed-length characters and is easier to decode; binary file encoding is variable-length, so it is flexible, has higher storage utilization, and is more difficult to decode (different binary file formats have different decoding methods). code method). Regarding space utilization, think about it, binary files can even use one bit to represent a meaning (bit operation), while any meaning in a text file is at least one character.
Many books also believe that text files are more readable and storage requires conversion time (reading and writing require encoding and decoding), while binary files are less readable and storage does not require conversion time (reading and writing do not require encoding and decoding. Write the value directly). The readability here is from the perspective of software users, because we can browse almost all text files using the general Notepad tool, so text files are said to be readable; while reading and writing a specific binary file requires a Specific file decoder, so the readability of binary files is poor. For example, to read BMP files, you must use image reading software. The storage conversion time here should be from a programming perspective, because some operating systems such as Windows need to convert carriage returns and line feeds (replace '\n' with '\r\n', so file reading and writing When running, the operating system needs to check character by character whether the current character is '\n' or '\r\n'). This storage conversion is not needed in the Linux operating system, of course, when running on two different operating systems This storage conversion may occur again when sharing files (such as Linux systems and Windows systems sharing text files). Regarding how to perform this conversion, I will give it in the next article "Conversion between Linux Text Files and Windows Text Files" ^_^
4. C text reading and writing and binary Reading and writing
It should be said that C text reading and writing and binary reading and writing are a programming level issue, related to the specific operating system, so "files read and written in text mode must be text files. Use The view that files read and written by binary must be binary files is wrong. The following description does not explicitly indicate the operating system type, but all refers to windows. The difference between C's textual reading and writing and binary reading and writing is only reflected in the processing of carriage returns and line feeds. When writing in text mode, every time it encounters a '\n' (0AH newline character), it will replace it with '\r\n' (0D0AH, carriage return and newline character), and then write it to the file; when reading text, Every time it encounters a '\r\n', it changes it to '\n' and then sends it to the read buffer. Just because the text mode has conversion between '\n'--'\r\n', the conversion is time-consuming. When reading and writing binary, there is no conversion, and the data in the write buffer is directly written to the file.
Generally speaking, from a programming perspective, text or binary reading and writing in C are interactions between the buffer and the binary stream in the file, except that there is a carriage return and line feed conversion when reading and writing text. Therefore, when there is no newline character '\n' (0AH) in the write buffer, the result of text writing and binary writing are the same. Similarly, when there is no '\r\n' (0DH0AH) in the file, the result of text reading is the same as that of binary writing. The result of binary reading is the same.
The above is the detailed content of What is the difference between text files and binary files?. For more information, please follow other related articles on the PHP Chinese website!