Transfer to: coolcode.cn
A few days ago, I wrote an article on how to display web pages normally in any character set. The introduction in it is very simple, that is, character sets other than the first 128 characters are represented by NCR, but I did not introduce the specific conversion method, because At the time I thought it was too simple. But later I found someone asked this question, so I will explain it in detail here.
The first step is to convert the string of the source character set into the UTF-16 character set. This step is because each character in the UTF-16 character set is two bytes, and it is easy to process later. It will be very complicated if you do the processing directly on the source character set. The source character set can be obtained from the meta tag in the original web page, or can be specified separately. My program allows the user to specify the source character set in the form, because I cannot guarantee that the file submitted by the user must be an HTML file (the same is true for other files) Yes, for example, the Chinese language package source file of WordPress is a po file, and the content in it can also be processed in this way), and even if it is an HTML file, it does not necessarily have a meta tag for specifying the character set, so specify it separately through the form The character set is relatively safe. You may think that converting one character set to another is complicated. Indeed, it is very troublesome to implement it yourself, but it is very easy to do it with PHP because it already contains such a function. , you can easily achieve conversion between various character sets through the iconv function. If the iconv extension is not installed on your machine, you can also use the mb_convert_encoding function. If the Multibyte String extension is not installed, there is nothing you can do. , because it is basically impossible for you to convert so many types of codes yourself, unless you are a top expert! It is recommended to use iconv because it is more efficient and supports more character sets.
After completing the above step, the next step is to process the string in units of two bytes. These two bytes are directly converted into numbers and are xxxxx in xxxx;. If the number is less than 128, use this character directly (note that it becomes a single byte here), otherwise use the form of xxxx;. One thing to note here is that when this number is 65279 (hexadecimal 0xFEFF), please ignore it, because this is the transmission control character in Unicode encoding, and our current string already only has iso-8859- 1 is the first 128 characters in the encoding, so we don't need it.
Okay, the basic idea is this. Here is the implementation program:
Download: nochaoscode.php
Copy the code The code is as follows:
function nochaoscode($encode, $str) {
$str = iconv($encode, "UTF-16BE", $str);
for ($i = 0; $i < strlen($str); $i++,$i++) {
$code = ord($str{$i}) * 256 + ord($str{$i + 1});
if ($code < 128) {
} else if ($code != 65279) {
$output
}
}
return $output;
}
?>
The above introduces the efficacy and function of Ganoderma lucidum spore powder and how to consume it. Method 2 (continued) of normal display of web pages in any character set, including the efficacy and role of Ganoderma lucidum spore powder and how to consume it. I hope it will be helpful to the PHP tutorial. Interested friends help.