Basic Concepts
"Character" and "byte" are different! You must understand this before continuing. A "byte" is an a-bit thing; it is the unit of space in computers (today). A "character" is composed of one or more bytes, and represents what we think of when reading.
A byte can represent only 256 different values. There are over 11,000 Korean characters and over 40,000 Chinese characters -- no way to squeeze such a character into a single byte.
Charset vs collation. These are different things! 'Charset' ('character set'; 'encoding') refers to the bits used to represent 'characters'. 'Collation' refers to how those bits could be compare for inequality (WHERE) and sorting (ORDER BY). GROUP BY and FOREIGN KEY CONSTRAINTS can also involve collation. And it even can involve deciding whether two different bit strings compare 'equal'.
FOR EXAMPLE
CHARACER FOR UTF8
-
mysql> select * from character_sets where CHARACTER_SET_NAME='utf8' \G;
-
*************************** 1. row ***************************
-
CHARACTER_SET_NAME: utf8
-
DEFAULT_COLLATE_NAME: utf8_general_ci
-
DESCRIPTION: UTF-8 Unicode
-
MAXLEN: 3
-
1 row in set (0.00 sec)
FOR COLLATIONS OF UTF8
internal struct :
-
typedef struct character_set
-
{
-
unsigned int number; /* character set number */
-
unsigned int state; /* character set state */
-
const char *csname; /* collation name */
-
const char *name; /* character set name */
-
const char *comment; /* comment */
-
const char *dir; /* character set directory */
-
unsigned int mbminlen; /* min. length for multibyte strings */
-
unsigned int mbmaxlen; /* max. length for multibyte strings */
-
} MY_CHARSET_INFO;
阅读(1834) | 评论(0) | 转发(0) |