Home>Article>Backend Development> How to use iconv function in php
The content in this article introduces how to use the iconv function in php. I will share it with you here. Friends in need can refer to it
I am working on a program recently and need to use the iconv function. Convert the captured utf-8 encoded page into gb2312. I found that if I use the iconv function to transcode the captured data, the data will be less for no reason.
iconv function library can complete conversion between various character sets and is an indispensable basic function library in PHP programming.
1. Download the libiconv function library http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.9.2.tar.gz;
2. Unzip tar -zxvf libiconv-1.9.2 .tar.gz;
3. Install libiconv
#configure --prefix=/usr/local/iconv
#make
#make install
4. Recompile php and add compilation parameters - -with-iconv=/usr/local/iconv
under windows
I am currently working on a thief program, and I need to use the iconv function to capture the utf -8 encoded pages were converted to gb2312, and I found that if I used the iconv function to transcode the captured data, the data would be less for no reason. It made me depressed for a while. After checking the information on the Internet, I found out that this was a bug in the iconv function. iconv will make an error when converting the character "—" to gb2312
The solution is very simple, that is, add "//IGNORE" after the encoding that needs to be converted, which is the second parameter of the iconv function. As follows:
The following is the quoted content:
Copy codeThe code is as follows:
iconv("UTF-8","GB2312//IGNORE",$data)
ignore means ignoring errors during conversion, if not ignore parameter, all strings following this character cannot be saved.
Copy codeThe code is as follows:
'; echo iconv('GB2312', 'UTF-8', $str); //将字符串的编码从GB2312转到UTF-8 echo '
'; echo iconv_substr($str, 1, 1, 'UTF-8'); //按字符个数截取而非字节 print_r(iconv_get_encoding()); //得到当前页面编码信息 echo iconv_strlen($str, 'UTF-8'); //得到设定编码的字符串长度 //也有这样用的 $content = iconv("UTF-8","gbk//TRANSLIT",$content); ?>
iconv is not the default function of php, and it is also a module installed by default. It needs to be installed before it can be used.
If it is windows2000 php, you can modify the php.ini file and remove the ";" before extension=php_iconv.dll. At the same time, you need to copy the iconv.dll in your original php installation file to your winnt/system32 (If your dll points to this directory)
In the Linux environment, use static installation and add an additional item --with-iconv when configure. phpinfo can see the iconv item. (Linux7.3 Apache4.06 php4.3.2),
Download: ftp://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.8.tar.gz
Installation:
#cp libiconv-1.8.tar.gz /usr/local/src
#tar zxvf lib*
#./configure --prefix=/usr/local/libiconv
#make
# make install
Compile php
#./configure --prefix=/usr/local/php4.3.2 --with-iconv=/usr/local/libiconv/
Simple example of use:
Introduction to mb_convert_encoding and iconv functions in PHP
mb_convert_encoding This function is used to convert encoding. I used to not understand the concept of program coding, but now I seem to understand a little bit.
However, English generally does not have encoding problems, only Chinese data will have this problem. For example, when you use Zend Studio or Editplus to write a program, you use gbk encoding. If the data needs to be entered into the database, and the database encoding is utf8, then the data must be encoded and converted, otherwise it will become garbled when entering the database. .
See the official usage of mb_convert_encoding:
http://cn.php.net/manual/zh/function.mb-convert-encoding.php
Make a GBK To UTF- 8
< ?php header("content-Type: text/html; charset=Utf-8"); echo mb_convert_encoding("妳係我的友仔", "UTF-8", "GBK"); ?>
Another GB2312 To Big5
< ?php header("content-Type: text/html; charset=big5"); echo mb_convert_encoding("你是我的朋友", "big5", "GB2312"); ?>
But to use the above function, you need to install it but you need to enable the mbstring extension library first.
Another function iconv in PHP is also used to convert string encoding, and has similar functions to the above function.
There are some detailed examples below:
iconv — Convert string to requested character encoding (PHP 4 >= 4.0.5, PHP 5) mb_convert_encoding — Convert character encoding (PHP 4 >= 4.0.6, PHP 5)
用法:
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
需要先enable mbstring 扩展库,在 php.ini里将; extension=php_mbstring.dll 前面的 ; 去掉
mb_convert_encoding 可以指定多种输入编码,它会根据内容自动识别,但是执行效率比iconv差太多;
string iconv ( string in_charset, string out_charset, string str )
注意:第二个参数,除了可以指定要转化到的编码以外,还可以增加两个后缀://TRANSLIT 和 //IGNORE,其中 //TRANSLIT 会自动将不能直接转化的字符变成一个或多个近似的字符,//IGNORE 会忽略掉不能转化的字符,而默认效果是从第一个非法字符截断。
Returns the converted string or FALSE on failure.
使用:
发现iconv在转换字符”—”到gb2312时会出错,如果没有ignore参数,所有该字符后面的字符串都无法被保存。不管怎么样,这个”—”都无法转换成功,无法输出。 另外mb_convert_encoding没有这个bug.
一般情况下用 iconv,只有当遇到无法确定原编码是何种编码,或者iconv转化后无法正常显示时才用mb_convert_encoding 函数.
from_encoding is specified by character code name before conversion. it can be array or string - comma separated enumerated list. If it is not specified, the internal encoding will be used.
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str, “UCS-2LE”, “JIS, eucjp-win, sjis-win”);
/* “auto” is expanded to “ASCII,JIS,UTF-8,EUC-JP,SJIS” */
$str = mb_convert_encoding($str, “EUC-JP”, “auto”);
例子:
$content = iconv(”GBK”, “UTF-8″, $content); $content = mb_convert_encoding($content, "UTF-8″,"GBK");
php中使用iconv函数时容易忽略的参数
今天在处理抓取内容的时候,当采用iconv进行编码转换的时候,发现结果会中断,猜是字符集的问题,考虑怎么跳过目标字符集不存在的字符,查手册发现iconv的函数只有三个参数,好像不行,然后查网上有人说可以,但是很奇怪怎么实现,最后发现英文描述有说可以加标识到目标编码后面:“TRANSLIT”,很郁闷怎么加呢?原来是先加“//”,真是郁闷,竟然有这样的设计
原型: $txtContent = iconv("utf-8",'GBK',$txtContent);
特殊参数:iconv("UTF-8","GB2312//IGNORE",$data)
两个可选的辅助参数:TRANSLIT和IGNORE ,(其中IGNORE 就是说遇到无法转换的就跳过)。 Description
string iconv ( string in_charset, string out_charset, string str )
Performs a character set conversion on the string str from in_charset to out_charset. Returns the converted string or FALSE on failure.
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character.
相关推荐:
php 通过iconv将字符串从GBK转换为UTF8字符集的方法
The above is the detailed content of How to use iconv function in php. For more information, please follow other related articles on the PHP Chinese website!