J2SE 5.0支持Unicode 4.0-bendeer-ChinaUnix博客

bendeerbendeer.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

bendeer

博客访问： 308406
博文数量： 115
博客积分： 1951
博客等级：上尉
技术积分： 728
用户组：普通用户
注册时间： 2007-09-26 14:05

文章分类

全部博文（115）

IM（2）
构建（1）
REST（2）
职场（1）
全球化与本地化（20）
Perl（8）
OS（17）

Linux（12）
花卉（0）
羽毛球（0）
JAVA（14）

Eclipse（0）
通信（13）
汽车（0）
时评（1）
数据库（9）
软件测试（23）

Selenium（0）

QTP（6）

自动化测试（1）
未分配的博文（4）

文章存档

2013年（4）

2012年（3）

2011年（26）

2010年（56）

2009年（26）

我的朋友

相关博文

J2SE 5.0支持Unicode 4.0

分类： Java

2010-04-07 21:25:23

Programs are written using the Unicode character set. Information about this character set and its associated character encodings may be found at:

java程序是基于Unicode 字符集来编写的。关于这个字符集以及它的相关的编码可以在这个网站找到：

The Java platform tracks the Unicode specification as it evolves. The precise version of Unicode used by a given release is specified in the documentation of the class Character.

Java平台跟着Unicode的规范而变化。Java的每一个版本用到的准确的Unicode的版本号，定义在Character类的文档中。

Versions of the Java programming language prior to 1.1 used Unicode version 1.1.5. Upgrades to newer versions of the Unicode Standard occurred in JDK 1.1 (to Unicode 2.0), JDK 1.1.7 (to Unicode 2.1), J2SE 1.4 (to Unicode 3.0), and J2SE 5.0 (to Unicode 4.0).

Java 1.1用的是Unicode 1.1.5。JDK 1.1 用的是Unicode 2.0， JDK 1.1.7 用的是Unicode 2.1， J2SE 1.4用的是Unicode 3.0, and J2SE 5.0 用的是Unicode 4.0。

（J2SE 6.0 用的也是Unicode 4.0）

The Unicode standard was originally designed as a fixed-width 16-bit character encoding. It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. Characters whose code points are greater than U+FFFF are called supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range, (U+D800 to U+DBFF), the second from the low-surrogates range (U+DC00 to U+DFFF). For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.

Unicode标准最初的设计是16位固定宽度的字符编码. 后来变为允许用多于16位来表示字符。现在的合法的代码点从U+0000 to U+10FFFF, 16进制的表示方式。代码点大于U+FFFF 的字符叫做补充字符. 为了只用16位来表示全部范围的字符， Unicode标准定义了一套编码，叫做UTF-16. 在这个编码中，补充字符被表示为2个16-bit编码, 第一部分编码的范围是(U+D800 to U+DBFF), 第二部分编码的范围是 (U+DC00 to U+DFFF). 对于在 U+0000 to U+FFFF范围的字符来说, 代码点的值和UTF-16编码是一致的。

Java编程语言用16位的编码代表文本。使用UTF-16编码. 少数的APIs, 主要在Character 类中，用32-bit 的整数来代表代码点的单个实例。Java平台提供方法在两种表示方法之间进行转换。

This book uses the terms code point and UTF-16 code unit where the representation is relevant, and the generic term character where the representation is irrelevant to the discussion.

J2SE 技术规范现在使用术语代码点和 UTF-16 代码单元（表示法是相关的）以及通用术语字符（表示法与该讨论没有关系）。(API 通常使用名称 codePoint 描述表示代码点的类型 int 的变量，而 UTF-16 代码单元的类型当然为 char。)

Except for comments , identifiers, and the contents of character and string literals (, ), all input elements in a program are formed only from ASCII characters (or Unicode escapes which result in ASCII characters). ASCII (ANSI X3.4) is the American Standard Code for Information Interchange. The first 128 characters of the Unicode character encoding are the ASCII characters.

除了注释，标识符，字符常量，字符串常量，程序里其他的所有的输入元素只能是ASCII字符（或者通过转义得到的ASCII字符）。ASCII (ANSI X3.4) 是美国信息互换标准代码. Unicode字符编码中的前128个字符就是ASCII字符。

参考：

阅读(537) | 评论(0) | 转发(0) |

上一篇：完整的CJK Unicode范围（5.0版）

下一篇：ISO 8859

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6