Quote: Due to database requirements, the database was changed from the original gbk2312 encoding to utf-8. In order to facilitate data interaction and reduce problems caused by coding inconsistencies,
It is necessary to transcode the entire website (gb2312-->utf-8)
1 Find batch transcoding tools online
Note: 1. This software supports selecting files or directories. There are some optional types or all files. It is convenient but also careful. Check whether there are files in the selected files that do not need to be transcoded, such as files with different encodings, pictures, etc. Be sure not to transcode them together.
2. There is no duplication function, so be careful not to select files repeatedly (what will happen if you select files repeatedly, I will try)
3. If you check "Keep file backup", each file will have a corresponding bak file generated. Since my project has been managed with git, there is no need for backup (git itself has a recovery function). How to back up the method depends on the details. Well, you still have to be cautious with this big move anyway.
2. Remove bom
Use the EditPlus editor to open the file you just transcoded. The status bar at the bottom shows that the encoding is "UTF-8+", which means it contains the BOM header.
What is BOM? Quoting a netizen: "In a UTF-8 encoded file, the BOM is in the header of the file, occupying three bytes, and is used to indicate that the file belongs to UTF-8 encoding. There are many softwares that recognize the BOM header, but there are still some that cannot recognize the BOM. header, for example, PHP cannot recognize the BOM header. This is also the reason why an error occurs after editing the utf-8 encoding with Notepad.
In this way, when PHP executes the program, the BOM header will be output as the content. When there is a requirement that no output is required, such as session_start(), an error will occur.
For a single file, use the editplus editor to open it and save it as a file without 'utf-8' (that is, without BOM).
For so many files, some netizens shared a script to quickly and accurately remove BOM headers in batches (the original author was not found, thank you for sharing~), and build a php in the root directory of the transcoded files. file, copy the following code into it, enter the access address in the url, and execute:
if (isset($_GET['dir'])){ //设置文件目录
$basedir=$_GET['dir'];
}else{
$basedir = '.';
}
$auto = 1;
checkdir($basedir);
function checkdir($basedir){
if ($dh = opendir($basedir)) {
while (($file = readdir($dh)) !== false) {
if ($file != '.' && $file != '..'){
if (!is_dir($basedir."/".$file)) {
echo "filename: $basedir/$file ".checkBOM("$basedir/$file")."
";
}else{
$dirname = $basedir."/".$file;
checkdir($dirname);
}
}
}
closedir($dh);
}
}
function checkBOM ($filename) {
global $auto;
$contents = file_get_contents($filename);
$charset[1] = substr($contents, 0, 1);
$charset[2] = substr($contents, 1, 1);
$charset[3] = substr($contents, 2, 1);
if (ord($charset[1]) == 239 && ord($charset[2]) == 187 && ord($charset[3]) == 191) {
if ($auto == 1) {
$rest = substr($contents, 3);
rewrite ($filename, $rest);
return ("BOM found, automatically removed._http://www.k686.com");
} else {
return ("BOM found.");
}
}
else return ("BOM Not Found.");
}
function rewrite ($filename, $data) {
$filenum = fopen($filename, "w");
flock($filenum, LOCK_EX);
fwrite($filenum, $data);
fclose($filenum);
}
?>
三,使用强大的ZendSdio批量查找替换htm中申明的gb2312的编码为utf-8
注意:新建的zend工程,查看htm是否正常显示,如果是乱码,查看工程htm的编码是否设置为utf-8, 选中工程,全局搜索(ctrl+H)“charset=gb2312”批量替换为“charset=utf-8”,
注意:可能某些引入项目外部的文件,需要保持申明为gb2312,所以,就需要排除这些例外,不可一起被替换,而对于本次已被转码的文件,是需要被替换的。
另外可能还有有空格的如“charset= gb2312”,没空格的,各种写法都搜一搜。以防有漏网之鱼。
四,然后重点是php文件里的gb2312(或gbk),要结合上下文逻辑语境,确定是否需要替换。各种写法也要都搜一搜,如utf8,utf-8,gbk,gb2312等