半个PostgreSQL DBA,热衷于数据库相关的技术。我的ppt分享https://pan.baidu.com/s/1eRQsdAa https://github.com/chenhuajun https://chenhuajun.github.io
分类: Mysql/postgreSQL
2017-01-01 20:28:18
字符集相关的系统变量
mysql> show variables like '%char%'; +--------------------------+------------------------------------------------------------------------------+ | Variable_name | Value | +--------------------------+------------------------------------------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/local/Percona-Server-5.6.29-rel76.2-Linux.x86_64.ssl101/share/charsets/ | +--------------------------+------------------------------------------------------------------------------+ 8 rows in set (0.01 sec)
各个变量的含义概述如下:
详细定义参考官网说明:
排序规则相关的系统变量:
mysql> show variables like '%collation%'; +----------------------+-----------------+ | Variable_name | Value | +----------------------+-----------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | +----------------------+-----------------+ 3 rows in set (0.00 sec)
排序规则和上面的字符集是对应的,就不解释了。但有一个问题,UTF8编码下该设置utf8_general_ci 还是utf8_unicode_ci有一些讨论。 比如:
utf8_general_ci排序略快一些,utf8_unicode_ci对某些语义排序更准确。然而,所谓的"更快",快的程度可以无视;"更准确"所适用的场景对使用中文的我们没啥意义。所以个人认为设啥都没关系,干脆顺气自然不设,让MySQL自己根据字符集选择缺省值吧(即utf8_general_ci)。
字符数据的最终存储到表的字符类型的列上,所以存储的最终体现形式是列的字符集。至于表的字符集不过是生成列时的缺省字符集;数据库的字符集不过建表时的缺省字符集。
谈到字符主要让人操心的是乱码问题,最简单有效的解决办法是统一设置UTF8编码。 只要在my.cnf的[mysqld]上设置character_set_server即可。
character_set_server = utf8mb4
这样,新创建的数据库和该数据库中的对象将默认采用'utf8mb4'编码;
JDBC(5.1.13以后版本)客户端将根据服务端的character_set_server设置合适的客户端编码;
http://dev.mysql.com/doc/relnotes/connector-j/5.1/en/news-5-1-14.html
Connector/J mapped both 3-byte and 4-byte UTF8 encodings to the same Java UTF8 encoding. To use 3-byte UTF8 with Connector/J set characterEncoding=utf8 and set useUnicode=true in the connection string. To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set. (Bug #58232)
排序规则无需专门设置,让它跟随编码自己变化。