Code for using ADODB.Stream to determine file encoding in JScript

Code for using ADODB.Stream to determine file encoding in JScript_javascript skills

WBOY

Release： 2016-05-16 19:03:57

Original

1411 people have browsed it

At first, I used ASCII encoding to read text data and simulated reading binary data. However, I found that if the character encoding is greater than 127, I will only get a value less than 128, which is equivalent to the remainder of 128, so ASCII encoding is not possible.

I continued searching and found an article "Reading And Writing Binary Files Using JScript" on CodeProejct.com, which contained exactly what I needed.

In fact, it is simple to say, just change the encoding and use 437. This is the ASCII encoding extended by IBM. The highest bit of the ASCII encoding is also used to expand the characters in the character set from 128. 256, and the character data read using this character set is equivalent to original binary data.

After solving the obstacle, it is time to start identifying the encoding of the file. By using the ADODB.Stream object to read the first two bytes of the file, and then based on these two bytes, you can determine whether the file encoding is What's up.

If a UTF-8 file has a BOM, then the first two bytes are 0xEF and 0xBB. For example, the first two bytes of a Unicode file are 0xFF and 0xFE. These are the basis for judging the file encoding.

It should be noted that when ADODB.Stream reads characters, there is no one-to-one correspondence. That is to say, if the binary data is 0xEF, the read characters will not be 0xFE after passing through charCodeAt. It is another value. This correspondence table can be found in the article mentioned above.

Program code:

Copy code The code is as follows:

function CheckEncoding( filename) { 
 var stream = new ActiveXObject("ADODB.Stream"); ​​
 stream.Mode = 3; 
 stream.Type = 2; 
 stream.Open(); 
 stream .Charset = "437"; 
 stream.LoadFromFile(filename); 
 var bom = escape(stream.ReadText(2)); 
 switch(bom) { 
 // 0xEF,0xBB = > UTF-8 
 case "%u2229%u2557": 
 encoding = "UTF-8"; 
 break; 
 // 0xFF,0xFE => ; Unicode 
 case "� %u25A0": 
 // 0xFE,0xFF => Unicode big endian 
 case "%u25A0�": 
 encoding = "Unicode"; 
 break; 
 // Can’t tell Just use GBK, so that Chinese can be processed correctly in most cases. 
                                    delete stream; 
 stream = null; 
 return encoding; 
} 


In this way, when needed, the encoding of the file can be obtained by calling the CheckEncoding function. 
I hope this article is helpful to you.