Home > Backend Development > PHP Tutorial > The efficacy and function of Ganoderma lucidum spore powder and how to consume it. Method 2 of displaying web pages normally in any character set (continued)

The efficacy and function of Ganoderma lucidum spore powder and how to consume it. Method 2 of displaying web pages normally in any character set (continued)

WBOY
Release: 2016-07-29 08:36:56
Original
1314 people have browsed it

Transfer to: coolcode.cn
A few days ago, I wrote an article on how to display web pages normally in any character set. The introduction in it is very simple, that is, character sets other than the first 128 characters are represented by NCR, but I did not introduce the specific conversion method, because At the time I thought it was too simple. But later I found someone asked this question, so I will explain it in detail here.
The first step is to convert the string of the source character set into the UTF-16 character set. This step is because each character in the UTF-16 character set is two bytes, and it is easy to process later. It will be very complicated if you do the processing directly on the source character set. The source character set can be obtained from the meta tag in the original web page, or can be specified separately. My program allows the user to specify the source character set in the form, because I cannot guarantee that the file submitted by the user must be an HTML file (the same is true for other files) Yes, for example, the Chinese language package source file of WordPress is a po file, and the content in it can also be processed in this way), and even if it is an HTML file, it does not necessarily have a meta tag for specifying the character set, so specify it separately through the form The character set is relatively safe. You may think that converting one character set to another is complicated. Indeed, it is very troublesome to implement it yourself, but it is very easy to do it with PHP because it already contains such a function. , you can easily achieve conversion between various character sets through the iconv function. If the iconv extension is not installed on your machine, you can also use the mb_convert_encoding function. If the Multibyte String extension is not installed, there is nothing you can do. , because it is basically impossible for you to convert so many types of codes yourself, unless you are a top expert! It is recommended to use iconv because it is more efficient and supports more character sets.
After completing the above step, the next step is to process the string in units of two bytes. These two bytes are directly converted into numbers and are xxxxx in xxxx;. If the number is less than 128, use this character directly (note that it becomes a single byte here), otherwise use the form of xxxx;. One thing to note here is that when this number is 65279 (hexadecimal 0xFEFF), please ignore it, because this is the transmission control character in Unicode encoding, and our current string already only has iso-8859- 1 is the first 128 characters in the encoding, so we don't need it.
Okay, the basic idea is this. Here is the implementation program:
Download: nochaoscode.php

Copy the code The code is as follows:


function nochaoscode($encode, $str) {
$str = iconv($encode, "UTF-16BE", $str);
for ($i = 0; $i < strlen($str); $i++,$i++) {
       $code = ord($str{$i}) * 256 + ord($str{$i + 1});
                 if ($code < 128) {
                                                                                                                                                                                     
       } else if ($code != 65279) {
                                                                                                                                                                                                                                       $output                                
         }
}
return $output;
}
?>


Among the parameters of the function, $encode is the source character set, and $str is the string that needs to be converted. The return result is the converted string.
Supplement: Today Legend told me a simpler method, which is to directly use the mb_convert_encoding function. Because mb_convert_encoding supports an encoding format called HTML-ENTITIES, which is NCR encoding. It's even simpler to use it.

The above introduces the efficacy and function of Ganoderma lucidum spore powder and how to consume it. Method 2 (continued) of normal display of web pages in any character set, including the efficacy and role of Ganoderma lucidum spore powder and how to consume it. I hope it will be helpful to the PHP tutorial. Interested friends help.

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template