Share an implementation method of converting Unicode to UTF-8 using PHP

PHPz
Release: 2023-03-06 11:28:02
Original
1659 people have browsed it

The following editor will bring you an article on how to use PHP to convert Unicode to UTF-8 (recommended). The editor thinks it is quite good, so I will share it with you now and give it as a reference for everyone. Let’s follow the editor and take a look.

The examples are as follows:

function unescape($str) {
  $str = rawurldecode($str);
  preg_match_all("/(?:%u.{4})|.{4};|\d+;|.+/U",$str,$r);
  $ar = $r[0];
  //print_r($ar);
  foreach($ar as $k=>$v) {
    if(substr($v,0,2) == "%u"){
      $ar[$k] = iconv("UCS-2BE","UTF-8",pack("H4",substr($v,-4)));
 }
    elseif(substr($v,0,3) == ""){
      $ar[$k] = iconv("UCS-2BE","UTF-8",pack("H4",substr($v,3,-1)));
 }
    elseif(substr($v,0,2) == "") {
       
      $ar[$k] = iconv("UCS-2BE","UTF-8",pack("n",substr($v,2,-1)));
    }
  }
  return join("",$ar);
}
echo unescape("紫星蓝");
Copy after login

Today there are Users reported that the Chinese data submitted by users of the form system would be garbled. Testing found that the problem lies in the iconv conversion.
iconv('UCS-2',
'GBK',
'Chinese')
Google

Search found that the reason is that the UCS-2 encoding method on the Linux server is different from that of Winodws Inconsistent.

So, I changed it to

iconv('UCS-2BE',
'GBK',
'Chinese')
Try it, Chinese is normal

The following are the unspoken rules regarding UCS-2 encoding for both platforms

:

1. UCS-2 is not equal to UTF-16. UTF-16 encodes each byte using the ASCII character range, while UCS-2 can encode each byte beyond the ASCII character range. UCS-2 and UTF-16 occupy up to two bytes per character, but their encodings are different.

#2. For UCS-2, the default under windows is UCS-2LE. Using MultibyteToWidechar (or A2W) generates UCS-2LE unicode. Windows Notepad can save text as UCS-2BE, which is equivalent to additional layers of conversion.

#3. For UCS-2, the default under Linux is UCS-2BE. Use iconv (specify UCS-2) to convert and generate UCS-2BE unicode. If you convert UCS-2 from Windows platform, you need to specify UCS-2LE.

4. In view of the different understandings of UCS-2 on multiple platforms such as windows and linux (UCS-2LE, UCS-2BE). MS advocates that unicode has a boot flag (UCS-2LE FFFE, UCS-2BE FEFF) to indicate that the following characters are unicode and identify big-endian or little-endian. Therefore, if the data coming from the windows platform has this prefix, don’t panic.

5. Linux encoding output, such as output from a file or output from printf, requires appropriate encoding matching on the console (if the encoding does not match, it is generally compiled with the program There are several relationships with the encoding at the time), and the conversion input of the console needs to check the current system encoding. For example, if the current encoding of the console is UTF-8, then UTF-8 encoded things can be displayed correctly, but GBK cannot; similarly, if the current encoding is GBK, GBK encoding can be displayed. Later systems should be updated. Smarter handling of more conversions. However, through terminals such as putty, you still need to set up the encoding conversion of the terminal to eliminate the trouble of garbled characters.

The above article uses PHP to convert Unicode to UTF-8 (recommended). This is all the content shared by the editor. I hope it can give you a reference, and I hope you will support php Chinese website. .

Related labels:
php
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!