半个PostgreSQL DBA,热衷于数据库相关的技术。我的ppt分享https://pan.baidu.com/s/1eRQsdAa https://github.com/chenhuajun https://chenhuajun.github.io
分类: Mysql/postgreSQL
2018-10-04 01:37:03
citus.limit_clause_row_fetch_count是citus的一个性能优化的参数。具体适应于什么场景呢?
下面是官方文档说明,还是不够具体。
https://docs.citusdata.com/en/v7.3/develop/api_guc.html?highlight=limit_clause_row_fetch_count
Planner Configuration citus.limit_clause_row_fetch_count (integer) Sets the number of rows to fetch per task for limit clause optimization. In some cases, select queries with limit clauses may need to fetch all rows from each task to generate results. In those cases, and where an approximation would produce meaningful results, this configuration value sets the number of rows to fetch from each shard. Limit approximations are disabled by default and this parameter is set to -1. This value can be set at run-time and is effective on the coordinator.
从citus的测试用例中,可以清楚的看到它的作用。
citus-7.2.1\src\test\regress\expected\multi_limit_clause_approximate.out
-- Enable limit optimization to fetch one third of each shard's data SET citus.limit_clause_row_fetch_count TO 600; SELECT l_partkey, sum(l_partkey * (1 + l_suppkey)) AS aggregate FROM lineitem GROUP BY l_partkey ORDER BY aggregate DESC LIMIT 10; DEBUG: push down of limit count: 600 l_partkey | aggregate -----------+------------ 194541 | 3727794642 160895 | 3671463005 183486 | 3128069328 179825 | 3093889125 162432 | 2834113536 153937 | 2761321906 199283 | 2726988572 185925 | 2672114100 196629 | 2622637602 157064 | 2614644408 (10 rows)
上面的SQL,如果不加SET citus.limit_clause_row_fetch_count TO 600,CN需要到worker上把所有数据都捞出来,然后再在CN上排序取TopN结果。 大数据量的情况,性能会非常糟糕。加上SET citus.limit_clause_row_fetch_count TO 600,就只会到每个worker上取前600的记录。但可能会带来准确性的损失。
另外一个需要注意的是,上面的GROUP BY字段l_partkey不是分片字段,如果GROUP BY字段已经包含了分片字段,不需要这个优化,因为这种情况下可以直接把LIMIT下推下去。
另一个测试用例,形式类似
... SET citus.limit_clause_row_fetch_count TO 150; SET citus.large_table_shard_count TO 2; SELECT c_custkey, c_name, count(*) as lineitem_count FROM customer, orders, lineitem WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey GROUP BY c_custkey, c_name ORDER BY lineitem_count DESC, c_custkey LIMIT 10; DEBUG: push down of limit count: 150 c_custkey | c_name | lineitem_count -----------+--------------------+---------------- 43 | Customer#000000043 | 42 370 | Customer#000000370 | 38 79 | Customer#000000079 | 37 689 | Customer#000000689 | 36 472 | Customer#000000472 | 35 685 | Customer#000000685 | 35 643 | Customer#000000643 | 34 226 | Customer#000000226 | 33 496 | Customer#000000496 | 32 304 | Customer#000000304 | 31 (10 rows)
适用场景
不适用场景