Home>Article>Backend Development> What does a php string consist of?

What does a php string consist of?

藏色散人
藏色散人 Original
2023-02-07 09:48:34 3207browse

php string is composed of a series of characters, each character is equivalent to one byte, which means that PHP can only support 256 character sets, so Unicode is not supported; in PHP, the string The implementation is an array of bytes plus an integer specifying the buffer length.

What does a php string consist of?

The operating environment of this tutorial: Windows 10 system, PHP version 8.1, DELL G3 computer

What does the php string consist of?

A string string is composed of a series of characters, where each character is equivalent to one byte. This means that PHP can only support a character set of 256 and therefore does not support Unicode.

Detailed explanation of string type

The implementation of string in PHP is an array of bytes plus an integer indicating the buffer length. There is no information on how to convert bytes into characters, it is up to the programmer to decide. There are no restrictions on what values a string consists of; in particular, bytes whose value is 0 ("NUL bytes") can be anywhere in the string (although there are several functions, referred to in this manual as non-binary "Safe", may ignore all data after NUL bytes).

This feature of the string type explains why there is no separate "byte" type in PHP - strings have been used instead. Functions that return non-text values - such as arbitrary data read from a network socket - will still return strings.

Since PHP does not specify the encoding of the string, how is the string encoded? For example, the string "á" is equal to "\xE1" (ISO-8859-1), "\xC3\xA1" (UTF-8, C form), "\x61\xCC\x81" (UTF-8, D form) or any other possible expression? The answer is that the string will be encoded in the same encoding as the script file. So if a script is encoded as ISO-8859-1, the strings in it will also be encoded as ISO-8859-1, and so on. However, this does not apply when Zend Multibyte is activated; in this case the script can be encoded in any way (either explicitly specified or automatically detected) and then converted to some internal encoding, and the string will be encoded in this way. Note that there are some constraints on the encoding of the script (or its internal encoding if Zend Multibyte is activated) - this means that this encoding should be a compatible superset of ASCII, such as UTF-8 or ISO-8859-1. Be aware, however, that state-dependent encodings where the same byte value can be used for both initial and non-initial characters can cause problems when switching states.

Of course, to be useful, functions that operate on text must make assumptions about how the string is encoded. Unfortunately, there are many variations on PHP's functions for this:

  • Some functions assume that the string is encoded in single bytes, but do not require that the bytes be interpreted as specific character. For example substr(), strpos(), strlen() and strcmp(). Another way to think about these functions is that they operate on memory buffers, i.e. they operate on bytes and byte subscripts.

  • Some functions are passed the encoding method of the string, and may also assume that this information is not available by default. Examples include htmlentities() and most functions in the mbstring extension.

  • Other functions use the current locale (see setlocale()), but operate byte by byte.

  • Finally some functions assume that the string is in a specific encoding, usually UTF-8. This is true for most functions in the intl extension and the PCRE (in the above example only when the u modifier is used) extension.

Finally, writing programs that use Unicode correctly relies on being careful to avoid functions that might corrupt data. To use functions from the intl and mbstring extensions. But using functions that can handle Unicode encodings is just the beginning. Regardless of the functions provided by any language, the most basic thing is to understand the Unicode specification. For example, a program that assumes only uppercase and lowercase characters would be completely wrong.

Recommended learning: "PHP Video Tutorial"

The above is the detailed content of What does a php string consist of?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn