关于MySQL的字符集-skykiker-ChinaUnix博客

博客访问： 2970925
博文数量： 199
博客积分： 1400
博客等级：上尉
技术积分： 4126
用户组：普通用户
注册时间： 2008-07-06 19:06

个人简介

半个PostgreSQL DBA，热衷于数据库相关的技术。我的ppt分享https://pan.baidu.com/s/1eRQsdAa https://github.com/chenhuajun https://chenhuajun.github.io

文章分类

全部博文（199）

其他（1）
citus（10）
greenlpum（1）
安全（1）
Pacemaker（3）
MySQL（21）
Symfoware（2）

Native（1）
分布式（0）
C（1）
Solaris（1）
Linux（11）
C#（3）
故障案例（5）
NoSQL（4）
云计算（1）
Windows（3）
Database（13）
PostgreSQL（101）

安装配置（1）

HA（3）

doc（6）

Npgsql（1）

psqlODBC（2）
嵌入式开发（8）
Java开发（2）
生活随笔（3）
未分配的博文（4）

文章存档

2020年（5）

2019年（1）

2018年（12）

2017年（23）

2016年（43）

2015年（51）

2014年（27）

2013年（21）

2011年（1）

2010年（4）

2009年（5）

2008年（6）

我的朋友

系统变量

字符集相关的系统变量

mysql> show variables like '%char%';
+--------------------------+------------------------------------------------------------------------------+
| Variable_name            | Value                                                                        |
+--------------------------+------------------------------------------------------------------------------+
| character_set_client     | utf8                                                                         |
| character_set_connection | utf8                                                                         |
| character_set_database   | utf8                                                                         |
| character_set_filesystem | binary                                                                       |
| character_set_results    | utf8                                                                         |
| character_set_server     | utf8                                                                         |
| character_set_system     | utf8                                                                         |
| character_sets_dir       | /usr/local/Percona-Server-5.6.29-rel76.2-Linux.x86_64.ssl101/share/charsets/ |
+--------------------------+------------------------------------------------------------------------------+
8 rows in set (0.01 sec)

各个变量的含义概述如下：

character_set_client ：客户端发给服务端的SQL的字符集
character_set_connection : 字符常量的缺省字符集
character_set_database：缺省数据库(即use指定的数据库)的缺省字符集
character_set_filesystem：文件系统字符集，用于解释文件名字符常量
character_set_results：结果集和错误消息的字符集
character_set_server: 服务器的缺省字符集
character_set_system: 系统标识符的字符集
character_sets_dir: 字符集安装目录

详细定义参考官网说明:

排序规则相关的系统变量:

mysql> show variables like '%collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

排序规则和上面的字符集是对应的，就不解释了。但有一个问题，UTF8编码下该设置utf8_general_ci 还是utf8_unicode_ci有一些讨论。比如:

utf8_general_ci排序略快一些，utf8_unicode_ci对某些语义排序更准确。然而，所谓的"更快",快的程度可以无视；"更准确"所适用的场景对使用中文的我们没啥意义。所以个人认为设啥都没关系，干脆顺气自然不设，让MySQL自己根据字符集选择缺省值吧(即utf8_general_ci)。

数据存储

字符数据的最终存储到表的字符类型的列上，所以存储的最终体现形式是列的字符集。至于表的字符集不过是生成列时的缺省字符集；数据库的字符集不过建表时的缺省字符集。

一劳永逸的字符设置

谈到字符主要让人操心的是乱码问题，最简单有效的解决办法是统一设置UTF8编码。只要在my.cnf的[mysqld]上设置character_set_server即可。

character_set_server           = utf8mb4

这样，新创建的数据库和该数据库中的对象将默认采用'utf8mb4'编码；

JDBC(5.1.13以后版本)客户端将根据服务端的character_set_server设置合适的客户端编码；

http://dev.mysql.com/doc/relnotes/connector-j/5.1/en/news-5-1-14.html

Connector/J mapped both 3-byte and 4-byte UTF8 encodings to the same Java UTF8 encoding.

To use 3-byte UTF8 with Connector/J set characterEncoding=utf8 and set useUnicode=true in the connection string.

To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set. (Bug #58232)

排序规则无需专门设置，让它跟随编码自己变化。

参考

阅读(3315) | 评论(0) | 转发(0) |

上一篇：与MySQL传统复制相比，GTID有哪些独特的复制姿势？

下一篇：MySQL 4字节utf8字符更新失败一例

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6