UCS-2编码(16进制) |
UTF-8 字节流(二进制) |
0000 - 007F |
0xxxxxxx |
0080 - 07FF |
110xxxxx 10xxxxxx |
0800 - FFFF |
1110xxxx 10xxxxxx 10xxxxxx |
UCS2 到 UTF-8 的算法:
if c <= 0x7f
// 1 字节
b0 = c
else if c >= 0x80 && c <= 0x7ff
// 2 字节
b0 = ((c >> 6) & 0x1f) | 0xc0
b1 = (c & 0x3f) | 0x80
else
// 3 字节
b0 = (c >> 12) | 0xe0
b1 = ((c >> 6) & 0x3f) | 0x80
b2 = (c & 0x3f) | 0x80
end if
UTF-8 到 UCS2 的算法:
if (b0 & 0xe0) == 0xe0
// 3 字节
c = (b0 & 0x0f) << 12;
c |= ((b1 & 0x3f) << 6);
c |= (b2 & 0x3f);
else if (b0 & 0xc0) == 0xc0
// 2 字节
c = (b0 & 0x1f) << 6;
c |= (b1 & 0x3f);
else
// 1 字节
c = b0
end if
阅读(1373) | 评论(0) | 转发(0) |