#include <iostream>
#include <fstream>
#include <string>
#include <regex>
using namespace std;
void main(){
string str = "今天是个好日子圣达菲阿斯qweer";
regex pattern("[\u4e00-\u9fa5]");
sregex_token_iterator end; //需要注意一下这里
for (sregex_token_iterator j(str.begin(), str.end(), pattern); j != end; ++j){
cout << *j;
}
system("pause");
}
C++在匹配中文的时候,部分文字乱码,不知道大家遇到过这种情况吗
u4e00-u9fa5 is the Chinese character matching Unicode
C++ does not support Unicode very well. If you are a program compiled with VS under Windows, ordinary strings will be ANSI encoded after compilation, which is GBK, and
L""
strings will be UTF16 LE. After C++11, you can Try usingu8""
(UTF8)u""
(UTF16)U""
(UTF32) to specify different UTF encodings of unicode stringsLooking at the source code regex should be in the C++ standard library. Looking for questions on stackoverflow, the general response is that the regex library in the C++ standard library does not support Unicode well.
http://stackoverflow.com/questions /11254232/do-c11-regular-expressions...
http://stackoverflow.com/questions/15882991/range-of-utf-8-characters-...
http://stackoverflow. com/questions/17103925/how-well-is-unicode-supplor...
I don’t know if using UTF32 or UTF16 can solve the problem. The generally recommended method is boost::regex + icu
This example looks like it can be solved using
u""