分类: WINDOWS
2015-01-22 10:04:21
原文地址:Windows对于Unicode的支持(转) 作者:GilBert1987
char * p = "Hello!" ;
char a[] = "Hello!" ;
static char a[] = "Hello!" ;
Unicode is a specification for supporting all character sets, including character sets that cannot be represented in a single byte. If you are programming for an international market, consider using either Unicode or (MBCSs) or enabling your program so you can build it for either by changing a switch.
A wide character is a 2-byte multilingual character code. Most characters used in modern computing worldwide, including technical symbols and special publishing characters, can be represented according to the Unicode specification as a wide character.
A wide-character string is represented as a wchar_t[] array and is pointed to by a wchar_t*prefixing the letter L to the character. For example, L'\0' is the terminating wide (16-bit) NULL character. Similarly, any ASCII string literal can be represented as a wide-character string literal by prefixing the letter L to the ASCII literal (L"Hello").
typedef unsigned short wchar_t ;
wchar_t * p = L"Hello!" ;//指针变数p要占用4个字节,而字符串变数需要14个字节-每个字符需要2个字//节,末尾的0还需要2个字节。
static wchar_t a[] = L"Hello!" ;
char * pc = "Hello!" ;
wchar_t * pw = L"Hello!" ;
'function' : incompatible types - from 'unsigned short *' to 'const char *'
0x0048 0x0065 0x006C 0x006C 0x006F 0x0021
48 00 65 00 6C 00 6C 00 6F 00 21 00
strlen函数的宽字符版是wcslen(wide-character string length:宽字符串长度),并且在STRING.H(其中也说明了strlen)和WCHAR.H中均有说明。strlen函数说明如下:
size_t __cdecl strlen (const char *) ;
size_t __cdecl wcslen (const wchar_t *) ;
The TCHAR data type is a Win32 character string that can be used to describe ANSI, double-byte character set (DBCS), or Unicode strings. For ANSI and DBCS platforms, TCHAR is defined as shown in the following Syntax section. For Unicode platforms, TCHAR is defined as synonymous with the WCHAR type.
typedef char TCHAR;
一个办法是使用Microsoft Visual C++包含的TCHAR.H表头文件。该表头文件不是ANSI C标准的一部分,因此那里定义的每个函数和宏定义的前面都有一条下划线。TCHAR.H为需要字符串参数的标准执行时期程序库函数提供了一系列的替代名称(例如,_tprintf和_tcslen)。有时这些名称也称为通用函数名称,因为它们既可以指向函数的Unicode版也可以指向非Unicode版。
等等。TCHAR.H还用一个新的资料型态TCHAR来解决两种字符资料型态的问题。如果定义了 _UNICODE识别字,那么TCHAR就是wchar_t:
_TEXT ("Hello!")
typedef wchar_t WCHAR ; // wc
typedef CHAR * PCHAR, * LPCH, * PCH, * NPSTR, * LPSTR, * PSTR ;
typedef CONST CHAR * LPCCH, * PCCH, * LPCSTR, * PCSTR ;
typedef WCHAR * PWCHAR, * LPWCH, * PWCH, * NWPSTR, * LPWSTR, * PWSTR ;
typedef CONST WCHAR * LPCWCH, * PCWCH, * LPCWSTR, * PCWSTR ;
#ifdef UNICODE
typedef WCHAR TCHAR, * PTCHAR ;
typedef LPWSTR LPTCH, PTCH, PTSTR, LPTSTR ;
typedef char TCHAR, * PTCHAR ;
typedef LPSTR LPTCH, PTCH, PTSTR, LPTSTR ;
#define __TEXT(quote) L##quote
#define TEXT(quote) __TEXT(quote)
ILength = lstrlen (pString) ;
pString = lstrcpy (pString1, pString2) ;
pString = lstrcpyn (pString1, pString2, iCount) ;
pString = lstrcat (pString1, pString2) ;
iComp = lstrcmp (pString1, pString2) ;
iComp = lstrcmpi (pString1, pString2) ;
CString is based on the TCHAR data type. If the symbol _UNICODE is defined for a build of yourprogram, TCHAR is defined as type wchar_t, a 16-bit character encoding type; otherwise, it is defined as char, the normal 8-bit character encoding. Under Unicode, then, CStrings are composed of 16-bit characters. Without Unicode, they are composed of characters of type char.
Use the _T macro to conditionally code literal strings to be portable to Unicode.
When you pass strings, pay attention to whether function arguments require a length in characters or a length in bytes. The difference is important if you're using Unicode strings.
Use portable versions of the C run-time string-handling functions.
Use the following data types for characters and character pointers:
LPCTSTR Where you would use const char*. CString provides the operator LPCTSTR to convert between CString and LPCTSTR.
· The class library is also enabled for multibyte character sets — specifically for double-byte character sets (DBCS).
· Under this scheme, a character can be either one or two bytes wide. If it is two bytes wide, its first byte is a special "lead byte," chosen from a particular range depending on which code page is in use. Taken together, the lead and "trail bytes" specify a unique character encoding.
· If the symbol _MBCS is defined for a build of your program, type TCHAR, on which CString is based, maps to char. It's up to you to determine which bytes in a CString are lead bytes and which are trail bytes. The C run-time library supplies functions to help you determine this.
· Under DBCS, a given string can contain all single-byte ANSI characters, all double-byte characters, or a combination of the two. These possibilities require special care in parsing strings, including CString objects.
By definition, the ASCII character set is a subset of all multibyte-character sets. In many multibyte character sets, each character in the range 0x00 – 0x7F is identical to the character that has the same value in the ASCII character set. For example, in both ASCII and MBCS character strings, the 1-byte NULL character ('\0') has value 0x00 and indicates the terminating null character.char cTest[] = "学习CString";
int cLength = sizeof(cTest);//12(4+7+1)
CString csTest = _T("学习CString");
int csLength = csTest.GetLength();//9(字符的个数)
int csConvertLength = ::WideCharToMultiByte(CP_ACP,0,csTest,-1,NULL,0,NULL,0);//12
char * cConvertChar = new char[csConvertLength];
wchar_t wcTest[] = L"学习CString";
int wcLength = sizeof(wcTest);//20((9 + 1)* 2)
TCHAR tTest2[] = _T("CString");
int tLength1 = sizeof(tTest1);//20((9 + 1)* 2)