半个PostgreSQL DBA,热衷于数据库相关的技术。我的ppt分享https://pan.baidu.com/s/1eRQsdAa https://github.com/chenhuajun https://chenhuajun.github.io
分类: Mysql/postgreSQL
2017-03-28 12:07:21
对于中文用户,在PostgreSQL中应该将编码无条件的设为UTF8,为简化和统一区域(loacle)也推荐尽量设置为C,但Collate和Ctype对性能或功能有一定影响,需要注意。
en_US=# \l+ List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description -----------+----------+----------+------------+------------+-----------------------+---------+------------+-------------------------------------------- en_US | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | | 7343 kB | pg_default | postgres | postgres | UTF8 | C | C | | 414 MB | pg_default | default administrative connection database template0 | postgres | UTF8 | C | C | =c/postgres +| 7225 kB | pg_default | unmodifiable empty database | | | | | postgres=CTc/postgres | | | template1 | postgres | UTF8 | C | C | =c/postgres +| 7225 kB | pg_default | default template for new databases | | | | | postgres=CTc/postgres | | | zh_CN | postgres | UTF8 | zh_CN.UTF8 | zh_CN.UTF8 | | 7225 kB | pg_default | (5 rows)
Collate会影响中文的排序,在zh_CN的区域下中文按拼音排序,其它区域按字符编码排序。
postgres=# select * from (values ('王'),('貂'),('西'),('杨')) a order by a; column1 --------- 杨 王 西 貂 (4 rows) postgres=# \c en_US You are now connected to database "en_US" as user "postgres". en_US=# select * from (values ('王'),('貂'),('西'),('杨')) a order by a; column1 --------- 杨 王 西 貂 (4 rows) en_US=# \c zh_CN You are now connected to database "zh_CN" as user "postgres". zh_CN=# select * from (values ('王'),('貂'),('西'),('杨')) a order by a; column1 --------- 貂 王 西 杨 (4 rows)
postgres=# create table tb1(c1 text); CREATE TABLE Time: 5.653 ms postgres=# insert into tb1 select md5(generate_series(1,1000000)::text); INSERT 0 1000000 Time: 2671.929 ms postgres=# vacuum ANALYZE tb1; VACUUM Time: 398.817 ms postgres=# select * from tb1 order by c1 limit 1; c1 ---------------------------------- 0000104cd168386a335ba6bf6e32219d (1 row) Time: 176.779 ms postgres=# create index idx1 on tb1(c1); CREATE INDEX Time: 1549.436 ms
Collate/Ctype C en_US.UTF8 zh_CN.UTF8 insert 2671 2613 2670 vacuum ANALYZE 398 250 396 order by 176 388 401 create index 1549 7492 7904 insert(with index) 11199 15621 16128
Ctype会影响pg_trgm和部分正则匹配的结果,比如Ctype为'C'时,pg_trgm将无法支持中文
postgres=# select show_trgm('aaabbbc到的x'); show_trgm ----------------------------------------------------- {" a"," x"," aa"," x ",aaa,aab,abb,bbb,bbc,"bc "} (1 row) en_US=# select show_trgm('aaabbbc到的x'); show_trgm ----------------------------------------------------------------------- {" a"," aa",0x27bdf1,0x30bd19,0x4624bc,aaa,aab,abb,bbb,bbc,0x6a2ad5} (1 row) zh_CN=# select show_trgm('aaabbbc到的x'); show_trgm ----------------------------------------------------------------------- {" a"," aa",0x27bdf1,0x30bd19,0x4624bc,aaa,aab,abb,bbb,bbc,0x6a2ad5} (1 row)
对性能要求不高的场景建议将Collate和Ctype都设置为zh_CN.UTF8,其它区域设置为C。
initdb -E UTF8 --locale=C --lc-collate=zh_CN.UTF8 --lc-ctype=zh_CN.UTF8 ...
对性能要求较高的场景建议将Ctype设置为zh_CN.UTF8,其它区域设置为C。如果有部分查询需要按拼音排序,可在列定义和SQL表达式中指定Collate为zh_CN。
initdb -E UTF8 --locale=C --lc-ctype=zh_CN.UTF8 ...