宽字符标量L-vc2-ChinaUnix博客

vc++2

首页　| 　博文目录　| 　关于我

vc2

博客访问： 3060171
博文数量： 909
博客积分： 4000
博客等级：上校
技术积分： 12260
用户组：普通用户
注册时间： 2008-05-06 20:50

文章分类

全部博文（909）

全部文章（906）
未分配的博文（3）

文章存档

2008年（909）

我的朋友

最近访客

推荐博文

宽字符标量L

分类：

2008-05-06 22:19:55

一起学习

宽字符标量L"xx"在VC6.0/7.0和GNU g 中的不同实现

作者：乾坤一笑

　　锲子：本文源于在 VCKBASE C 论坛和周星星大哥的一番讨论，这才使我追根索源，找到了理论依据和实践的证明。（本文一些资料和测试代码由周星星提供）

《The C Programming Language 3rd》中有这么两段话：

from 4.3:
A type wchar_t is provided to hold characters of a larger character set such as Unicode. It is a distinct type. The size of wchar_t is implementation-defined and large enough to hold the largest character set supported by the implementation’s locale (see §21.7, §C.3.3). The strange name is a leftover from C. In C, wchar_t is a typedef (§4.9.7) rather than a builtin type. The suffix _ t was added to distinguish standard typedefs.

from 4.3.1:
Wide character literals are of the form L′ab′, where the number of characters between the quotes and their meanings is implementation-defined to match the wchar_t type. A wide character literal has type wchar_t.

这两段话中有两个要点是我们关心的：

wchar_t 的长度是由实现决定的；
L"ab" 的含义是由实现决定的。

那么GNU g 和VC6.0/7.0各是怎么实现的呢？看下面代码：

//author: **.Zhou

#include 

#include 

#include 



void prt( const void* padd, size_t n )

{

    const unsigned char* p = static_cast( padd );

    const unsigned char* pe = p   n;

    for( ; p

      　　这段代码说明了，g  （Dev-CPP 用的是 MingGW 编译器）中 L"xx" 解释为把作为 non-wide-char 的 "xx" 

      扩展为作为 wide-char 的 wchar_t，不足则在高位补0；而 VC6.0 的 L"xx" 解释为把作为 MBCS 的 "xx" 

      转换为作为 unicode 的 WCHAR，目前的 MBCS 是以 char 为一个存储单元的，而 WCHAR 在 winnt.h 中定义为 

      typedef wchar_t WCHAR。在 Windows 平台上，只要是超过 0~127 范围内的 char 型字符，都被视为 MBCS，它由1到2个字节组成，MBCS 

      字符集跟它的地区代码页号有关。在某个特定的 Windows 平台，默认的代码页号可以在控制面板 -> 区域选项中设定。


      


      关于上述结论可以有下面这个程序来验证：

      //author: smileonce

#include 

#include 

#include 

#include 



void prt( const void* padd, size_t n )

{

    const unsigned char* p = static_cast( padd );

    const unsigned char* pe = p   n;

    for( ; p

      呵呵，问题已经明了，总结一下：

        ISO C 中 wchar_t 是一个 typedef，ISO C   中 wchar_t 是语言内建的数据类型，L"xx" 是ISO 

        C/C   语言内建的表示 wchar_t 的文本量的语法；

        wchar_t 的长度是由实现决定的；

        L"xx" 的意义是由实现决定的；

        默认的 "xx" 是 non-wide-char，其每个元素数据的类型是 char；与其相对应的 L"xx" 

        是wide-char,其每个元素数据的类型是wchar_t。

      

      　　为什么 C/C   语言把 L"xx" 定义为由实现决定的呢？这显然是为了 C/C   的普适性、可移植性。Bjarne 

      的观点认为，C   的方式是允许程序员使用任何字符集作为串的字符类型。另外，unicode 

      编码已经发展了若干版本了，是否能永久适合下去也不得而知。有关 unicode 的详细论述以及和其它字符集的比较，我推荐你看《无废话xml》。


      


      

    


下载本文示例代码












宽字符标量L

阅读(1630) | 评论(0) | 转发(0) |

上一篇：用VC实现PC并行端口数字信息的输入/输出

下一篇：九宫问题(八数码)求解过程动态演示

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6