bytes: A type of character sequence. By comparing dir(str) and dir(bytes), we can see that the properties and methods of the two are very similar, with only a few differences. Therefore, bytes can also have various operation methods on byte sequences like string, such as search (find), length (len), cutting (split), slicing, etc.
The advantage of bytes is that it is a built-in method in Python and does not require the installation of additional third-party modules.
But the disadvantage is also obvious: it can only query a single query, and cannot query multiple required results at one time.
First open the file through the rb mode of open and read the content as bytes type. There is a find() method to find a specific string, but this method can only find the first string index that meets the requirements, and it does not give a single-bit index, but an 8-bit one-byte index. When you need to find multiple matching strings, there is no built-in findall() method. If you want to query multiple, the process will be troublesome. First find the first matching index 1, start with this index 1, query the second matching index 2, and so on until the end of the query.
with open(path, 'rb') as f: datas = f.read() start_char = datas.find(b'Start') # start_char2 = datas.find(b'Start', start_char) end_char = datas.find(b'End', start_char) # end_char2 = datas.find(b'End', start_char2) data = datas[start_char:end_char] print(data)
Note that in the above code, start_char and end_char will appear multiple times, and the times are not necessarily the same. It is necessary to obtain the content between the two indexes, but it can neither be looped nor checked at once. The commented line of code needs to be executed multiple times to obtain the keyword index. Since we don’t know how many start flags there will be in the file data, we don’t know how many times it will be executed. This should be solved by looping, but there seems to be no variable for looping. This makes the problem more complex.
Secondly, since the content between the two signs is obtained, the above process needs to be performed twice. Therefore, the process is even more complicated.
Therefore, finding new methods is completely necessary.
bitstring is a three-party package that reads binary files in the form of byte streams.
The first sentence of the bitstring.py file is: This package defines classes that simplify bit-wise creation, manipulation and interpretation of data.
The translation is as follows: This package defines classes that simplify bit-wise creation, manipulation and interpretation of data. Bit-by-bit creation, manipulation, and interpretation of data.
The simple understanding is to directly operate bytes type data.
There are four main categories, as follows:
Bits -- An immutable container for binary data.
BitArray -- A mutable container for binary data.
ConstBitStream -- An immutable container with streaming methods.
BitStream -- A mutable container with streaming methods.
Bits -- An immutable container of binary data.
BitArray -- Mutable container of binary data.
ConstBitStream -- Immutable container with stream methods.
BitStream -- Mutable container with stream methods.
Like bytes, first read the file content, find the keyword index, and slice to obtain the data content.
# update at 2022/05/06 start # from bistring import ConstBitStream, BitStream from bitstring import ConstBitStream, BitStream # update at 2022/05/06 end hex_datas = ConstBitStream(filename=path) # 读取文件内容 start_char = b'Start' start_chars = hex_datas.findall(start_char, bytealigned=True) # 一次找到全部符合的,返回一个生成器 start_indexs = [] for start_char in start_chars: start_indexs.append(start_char) end_char = b'End' end_indexs = [] for start_index in start_indexs: end_chars = hex_datas.find(end_char, start=start_index, bytealigned=True) # 找到第一个符合的,返回元组 for end_char in end_chars: end_indexs.append(end_char) result = [] for i in range(min(len(start_indexs), len(end_indexs))): hex_data = hex_datas[start_indexs[i]:end_indexs[i]] str_data = BitStream.tobytes(hex_data).decode('utf-8') result.append(str_data)
Code analysis, first import the two required classes: ConstBitStream, BitStream. To get the file content, findall() finds all matching string indexes, and find() finds the first matching string index. Take the smaller value of the two lists of start and end, and slice to obtain the data. The type is "bitstring.ConstBitStream". The BitStream.tobytes() method converts it to bytes type. Chinese characters will be garbled, so use decode() to decode and get required string.
The whole process is still concise and continuous. The findall(), find(), and tobytes() methods are used in the code. In addition, there are many small details that need to be paid attention to. For example, if start_indexs is empty, subsequent code should not be executed, and the same is true for end_indexs if it is empty.
It can be seen that the bitstring package is relatively easy to use. According to the needs, there are relatively few methods used. In fact, there are many other methods, choose as needed.
The above is the detailed content of How to read binary data in Python?. For more information, please follow other related articles on the PHP Chinese website!