正则表达式 - C++正则匹配中文乱码
迷茫
迷茫 2017-04-17 11:57:54
0
1
1494
#include <iostream>
#include <fstream>
#include <string>
#include <regex>
 using namespace  std;
 void main(){
     string str = "今天是个好日子圣达菲阿斯qweer";
     regex pattern("[\u4e00-\u9fa5]");
     sregex_token_iterator end;  //需要注意一下这里
     for (sregex_token_iterator j(str.begin(), str.end(), pattern); j != end; ++j){
         cout << *j;
     }
     system("pause");
 }

C++在匹配中文的时候,部分文字乱码,不知道大家遇到过这种情况吗

迷茫
迷茫

业精于勤,荒于嬉;行成于思,毁于随。

reply all(1)
左手右手慢动作

u4e00-u9fa5 is the Chinese character matching Unicode
C++ does not support Unicode very well. If you are a program compiled with VS under Windows, ordinary strings will be ANSI encoded after compilation, which is GBK, and L"" strings will be UTF16 LE. After C++11, you can Try using u8""(UTF8) u""(UTF16) U""(UTF32) to specify different UTF encodings of unicode strings

Looking at the source code regex should be in the C++ standard library. Looking for questions on stackoverflow, the general response is that the regex library in the C++ standard library does not support Unicode well.
http://stackoverflow.com/questions /11254232/do-c11-regular-expressions...
http://stackoverflow.com/questions/15882991/range-of-utf-8-characters-...
http://stackoverflow. com/questions/17103925/how-well-is-unicode-supplor...

I don’t know if using UTF32 or UTF16 can solve the problem. The generally recommended method is boost::regex + icu
This example looks like it can be solved using u""

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template