半个PostgreSQL DBA,热衷于数据库相关的技术。我的ppt分享https://pan.baidu.com/s/1eRQsdAa https://github.com/chenhuajun https://chenhuajun.github.io
分类: Mysql/postgreSQL
2017-03-24 02:46:16
citus对支持的SQL有一定的限制,其中包括最常见的join,具体如下
无限制。根据情况会以下面几种方式之一支持
与inner join的主要不同是不支持数据重分布,因此无法支持两个分片规则不一致的大表的outer join。 另外,参考表只有出现在left join的右边或right join的左边才被支持。
下面演示2个大表的join
master和worker都在1台机器上,端口号不同
在master上添加worker节点
SELECT * from master_add_node('/tmp', 60002); SELECT * from master_add_node('/tmp', 60003);
postgres=# create table tb1(id int,k int); CREATE TABLE postgres=# create table tb2(id int,k int); CREATE TABLE postgres=# select create_distributed_table('tb1','id'); create_distributed_table -------------------------- (1 行记录) postgres=# select create_distributed_table('tb2','id'); create_distributed_table -------------------------- (1 行记录)
postgres=# create unlogged table tbx as select id,id ss from ( select generate_series(1,1000000) id) a; SELECT 1000000 时间:414.776 ms postgres=# copy tb1 from PROGRAM 'psql -p60001 -Atc "copy tbx to STDOUT"'; COPY 1000000 时间:748.383 ms postgres=# copy tb2 from PROGRAM 'psql -p60001 -Atc "copy tbx to STDOUT"'; COPY 1000000 时间:757.981 ms
postgres=# select count(1) from tb1 join tb2 on(tb1.id=tb2.id); count --------- 1000000 (1 行记录) 时间:1889.941 ms
执行计划
postgres=# explain select count(1) from tb1 join tb2 on(tb1.id=tb2.id); QUERY PLAN -------------------------------------------------------------------------------------------------- Distributed Query into pg_merge_job_0033 Executor: Task-Tracker Task Count: 32 Tasks Shown: One of 32 -> Task Node: host=/tmp port=60002 dbname=postgres -> Aggregate (cost=1818.06..1818.07 rows=1 width=8) -> Hash Join (cost=849.88..1739.19 rows=31550 width=0) Hash Cond: (tb1.id = tb2.id) -> Seq Scan on tb1_102008 tb1 (cost=0.00..455.50 rows=31550 width=4) -> Hash (cost=455.50..455.50 rows=31550 width=4) -> Seq Scan on tb2_102040 tb2 (cost=0.00..455.50 rows=31550 width=4) Master Query -> Aggregate (cost=0.00..0.00 rows=0 width=0) -> Seq Scan on pg_merge_job_0033 (cost=0.00..0.00 rows=0 width=0) (15 行记录) 时间:14.952 ms
postgres=# select count(1) from tb1 join tb2 on(tb1.id=tb2.k); ERROR: cannot use real time executor with repartition jobs 提示: Set citus.task_executor_type to "task-tracker". 时间:16.238 ms
默认的real-time执行器不支持这种join,先设置执行器为'task-tracker'
set citus.task_executor_type='task-tracker'
再执行SQL
postgres=# select count(1) from tb1 join tb2 on(tb1.id=tb2.k); count --------- 1000000 (1 行记录) 时间:16339.376 ms postgres=# select count(1) from tb1 join tb2 on(tb1.k=tb2.k); count --------- 1000000 (1 行记录) 时间:16263.971 ms
16秒完成2个100w大表的join效率也不低了。
执行计划如下:
postgres=# explain select count(1) from tb1 join tb2 on(tb1.k=tb2.k); QUERY PLAN ----------------------------------------------------------------------------- Distributed Query into pg_merge_job_0036 Executor: Task-Tracker Task Count: 8 Tasks Shown: None, not supported for re-partition queries -> MapMergeJob Map Task Count: 32 Merge Task Count: 8 -> MapMergeJob Map Task Count: 32 Merge Task Count: 8 Master Query -> Aggregate (cost=0.00..0.00 rows=0 width=0) -> Seq Scan on pg_merge_job_0036 (cost=0.00..0.00 rows=0 width=0) (13 行记录) 时间:22.865 ms postgres=# explain select count(1) from tb1 join tb2 on(tb1.k=tb2.k); QUERY PLAN ----------------------------------------------------------------------------- Distributed Query into pg_merge_job_0039 Executor: Task-Tracker Task Count: 8 Tasks Shown: None, not supported for re-partition queries -> MapMergeJob Map Task Count: 32 Merge Task Count: 8 -> MapMergeJob Map Task Count: 32 Merge Task Count: 8 Master Query -> Aggregate (cost=0.00..0.00 rows=0 width=0) -> Seq Scan on pg_merge_job_0039 (cost=0.00..0.00 rows=0 width=0) (13 行记录) 时间:21.905 ms
join列和分片列一致时可以支持
postgres=# select count(1) from tb1 left join tb2 on(tb1.id=tb2.id); count --------- 1000000 (1 行记录) 时间:1929.182 ms
join列和分片列不一致时不支持
postgres=# select count(1) from tb1 left join tb2 on(tb1.id=tb2.k); ERROR: cannot run outer join query if join is not on the partition column 描述: Outer joins requiring repartitioning are not supported. 时间:0.268 ms
创建参考表tb3
postgres=# create table tb3(id int,k int); CREATE TABLE 时间:0.758 ms postgres=# select create_reference_table('tb3'); create_reference_table ------------------------ (1 行记录) 时间:28.051 ms
参考表在left join右边时可以支持
postgres=# select count(1) from tb1 left join tb3 on(tb1.k=tb3.k); count --------- 1000000 (1 行记录) 时间:1942.156 ms
参考表在left join左边时不支持
postgres=# select count(1) from tb3 left join tb1 on(tb1.k=tb3.k); ERROR: cannot run outer join query if join is not on the partition column 描述: Outer joins requiring repartitioning are not supported. 时间:0.183 ms
right join正好相反
postgres=# select count(1) from tb3 right join tb1 on(tb1.k=tb3.k); count --------- 1000000 (1 行记录) 时间:2155.268 ms postgres=# select count(1) from tb1 right join tb3 on(tb1.k=tb3.k); ERROR: cannot run outer join query if join is not on the partition column 描述: Outer joins requiring repartitioning are not supported. 时间:0.348 ms
full join不支持
postgres=# select count(1) from tb1 full join tb3 on(tb1.k=tb3.k); ERROR: cannot run outer join query if join is not on the partition column 描述: Outer joins requiring repartitioning are not supported. 时间:0.180 ms postgres=# select count(1) from tb3 full join tb1 on(tb1.k=tb3.k); ERROR: cannot run outer join query if join is not on the partition column 描述: Outer joins requiring repartitioning are not supported. 时间:0.163 ms