字符编码-daxi1987-ChinaUnix博客

xxxxx

首页　| 　博文目录　| 　关于我

daxi1987

博客访问： 290429
博文数量： 41
博客积分： 2630
博客等级：少校
技术积分： 702
用户组：普通用户
注册时间： 2007-09-30 15:56

文章分类

全部博文（41）

GUN/Linux（8）
Tools/Tips（6）
Communication（3）
Programming（9）
Algorithm（7）
Database（3）
Perspective（5）
未分配的博文（0）

文章存档

2012年（2）

2011年（2）

2010年（3）

2009年（26）

2008年（8）

我的朋友

xgmiao

相关博文

字符编码

分类： C/C++

2008-06-09 16:24:14

ASCII：对128个英语字符进行编码，占8个字节，从00000000~01111111
简体中文编码：GB2312
Unicode: Unicode只是一个符号集，是个标准而非一种编码实现，它只规定了符号的二进制代码。
UTF-8：UTF-8是在互联网上使用最广的一种unicode的实现方式，以8位为单元对UCS进行编码。

Unicode:
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

UTF:
Unicode Translation Format

UTF-8的编码规则：
1）对于单字节的符号，字节的第一位设为0，后面7位为这个符号的unicode码。因此对于英语字母，UTF-8编码和ASCII码是相同的。
2）对于n字节的符号（n>1），第一个字节的前n位都设为1，第n+1位设为0，后面字节的前两位一律设为10。剩下的没有提及的二进制位，全部为这个符号的unicode码。

utf8---->utf16

wstring utf8_to_utf16( const char *utf8 ) { int len = ::MultiByteToWideChar(CP_UTF8,0,utf8,(int)strlen(utf8),NULL,0); if (len == 0) return wstring(TEXT("")); TCHAR *unicode = new TCHAR[len+1]; unicode[len]=L'\0'; ::MultiByteToWideChar(CP_UTF8,0,utf8,(int)strlen(utf8),unicode,len+1); wstring ws(unicode); delete[] unicode; return ws; }

utf16----->utf8

string utf16_to_utf8( const wchar_t* unicode ) { int nLen =

::WideCharToMultiByte(CP_ACP, 0, unicode, -1, NULL, 0, NULL, NULL); if (nLen<= 0) return string(""); char* pszDst = new char[nLen]; if (NULL == pszDst) return string(""); ::WideCharToMultiByte(CP_ACP, 0, unicode, -1, pszDst, nLen, NULL, NULL); pszDst[nLen -1] = 0; string strTemp(pszDst); delete[] pszDst; return strTemp; }

待补充...

阅读(1963) | 评论(1) | 转发(0) |

上一篇：GCC、GDB的学习

下一篇：将Visual C++ 6.0的字体美化

给主人留下些什么吧！~~

chinaunix网友2008-07-15 20:38:31

支持一下，很有用！

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6