PostgreSQL中1000分区的继承分区表更新失败的原因-skykiker-ChinaUnix博客

博客访问： 2971140
博文数量： 199
博客积分： 1400
博客等级：上尉
技术积分： 4126
用户组：普通用户
注册时间： 2008-07-06 19:06

个人简介

半个PostgreSQL DBA，热衷于数据库相关的技术。我的ppt分享https://pan.baidu.com/s/1eRQsdAa https://github.com/chenhuajun https://chenhuajun.github.io

文章分类

全部博文（199）

其他（1）
citus（10）
greenlpum（1）
安全（1）
Pacemaker（3）
MySQL（21）
Symfoware（2）

Native（1）
分布式（0）
C（1）
Solaris（1）
Linux（11）
C#（3）
故障案例（5）
NoSQL（4）
云计算（1）
Windows（3）
Database（13）
PostgreSQL（101）

安装配置（1）

HA（3）

doc（6）

Npgsql（1）

psqlODBC（2）
嵌入式开发（8）
Java开发（2）
生活随笔（3）
未分配的博文（4）

文章存档

2020年（5）

2019年（1）

2018年（12）

2017年（23）

2016年（43）

2015年（51）

2014年（27）

2013年（21）

2011年（1）

2010年（4）

2009年（5）

2008年（6）

我的朋友

1.错误现象

命令行(SQL执行失败）：

点击(此处)折叠或打开

-bash-4.1$ pgbench -n -r -c 10 -j 10 -T 2 -f update.sql db1000
Client 2 aborted in state 1. Probably the backend died while processing.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
Client 3 aborted in state 1. Probably the backend died while processing.
Client 0 aborted in state 1. Probably the backend died while processing.
Client 9 aborted in state 1. Probably the backend died while processing.
Client 8 aborted in state 1. Probably the backend died while processing.
Client 7 aborted in state 1. Probably the backend died while processing.
Client 1 aborted in state 1. Probably the backend died while processing.
Client 4 aborted in state 1. Probably the backend died while processing.
Client 6 aborted in state 1. Probably the backend died while processing.
Client 5 aborted in state 1. Probably the backend died while processing.
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
duration: 2 s
number of transactions actually processed: 0
tps = 0.000000 (including connections establishing)
tps = 0.000000 (excluding connections establishing)
statement latencies in milliseconds:
0.271800 \setrandom id 1 10000000
0.000000 update maintb set name = 'aaaaa12345' where id=:id;

pg服务端日志(服务端进程被杀并重启):

点击(此处)折叠或打开

LOG: server process (PID 18565) was terminated by signal 9: Killed
DETAIL: Failed process was running: update maintb set name = 'aaaaa12345' where id=150458;
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted; last known up at 2015-01-16 14:18:26 CST
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 3/4A153AC8
LOG: record with zero length at 3/4AD01498
LOG: redo done at 3/4AD01468
LOG: last completed transaction was at log time 2015-01-16 14:35:11.990522+08
LOG: database system is ready to accept connections
LOG: autovacuum launcher started

/var/log/messages(OOM):

点击(此处)折叠或打开

Jan 16 14:35:40 hanode1 kernel: Out of memory: Kill process 18565 (postgres) score 94 or sacrifice child
Jan 16 14:35:40 hanode1 kernel: Killed process 18565, UID 26, (postgres) total-vm:608864kB, anon-rss:150480kB, file-rss:3180kB

2.PG进程究竟分配了多少内存?

通过debug PG9.4的代码，发现生成分区表的1个update语句的执行计划居然耗费了717M的内存。

sql:

点击(此处)折叠或打开

db1000=# update maintb set name='aaaaa12345' where id =3;
UPDATE 19

src/backend/tcop/postgres.c:

点击(此处)折叠或打开

/*
* OK to analyze, rewrite, and plan this query.
*
* Switch to appropriate context for constructing querytrees (again,
* these must outlive the execution context).
*/
oldcontext = MemoryContextSwitchTo(MessageContext);
querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
NULL, 0);
plantree_list = pg_plan_queries(querytree_list, 0, NULL);
/* Done with the snapshot used for parsing/planning */
if (snapshot_set)
PopActiveSnapshot();

在上面的pg_plan_queries()调用前后分别在gdb中执行“call MemoryContextStats(MessageContext)”，从输出结果可以看出pg_plan_queries分配了717M内存。

点击(此处)折叠或打开

MessageContext: 16384 total in 2 blocks; 9976 free (9 chunks); 6408 used
MessageContext: 721420288 total in 96 blocks; 3715776 free (13 chunks); 717704512 used

1个客户端700多M，10个客户端并行执行就是7个G，我的VM只有2G内存，难怪被kill掉。

3.问题在哪里？

同样的方法，测量了另外两种情况的内存消耗。
100个分区的update
sql:

点击(此处)折叠或打开

db100=# update maintb set name='aaaaa12345' where id =3;
UPDATE 8

生成执行计划消耗的内存:8M

点击(此处)折叠或打开

MessageContext: 24576 total in 2 blocks; 18168 free (9 chunks); 6408 used
MessageContext: 8380416 total in 10 blocks; 5056 free (3 chunks); 8375360 used

从8M到700M，不难看出：分区表update的执行计划消耗内存与分区数目n是n^2的倍数关系。

进一步细化，发现关键点在下面两处内存分配。
src/backend/optimizer/plan/planner.c

点击(此处)折叠或打开

static Plan *
inheritance_planner(PlannerInfo *root)
{
...
foreach(lc, root->append_rel_list)//循环1001次
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);
PlannerInfo subroot;
Plan *subplan;
Index rti;
/* append_rel_list contains all append rels; ignore others */
if (appinfo->parent_relid != parentRTindex)
continue;
/*
* We need a working copy of the PlannerInfo so that we can control
* propagation of information back to the main copy.
*/
memcpy(&subroot, root, sizeof(PlannerInfo));
/*
* Generate modified query with this rel as target. We first apply
* adjust_appendrel_attrs, which copies the Query and changes
* references to the parent RTE to refer to the current child RTE,
* then fool around with subquery RTEs.
*/
subroot.parse = (Query *)
adjust_appendrel_attrs(root,
(Node *) parse,
appinfo);//大约分配了300K
/*
* The rowMarks list might contain references to subquery RTEs, so
* make a copy that we can apply ChangeVarNodes to. (Fortunately, the
* executor doesn't need to see the modified copies --- we can just
* pass it the original rowMarks list.)
*/
subroot.rowMarks = (List *) copyObject(root->rowMarks);
/*
* The append_rel_list likewise might contain references to subquery
* RTEs (if any subqueries were flattenable UNION ALLs). So prepare
* to apply ChangeVarNodes to that, too.
*/
subroot.append_rel_list = (List *) copyObject(root->append_rel_list);//大约分配了400K
/*
* Add placeholders to the child Query's rangetable list to fill the
* RT indexes already reserved for subqueries in previous children.
* These won't be referenced, so there's no need to make them very
* valid-looking.
*/
while (list_length(subroot.parse->rtable) < list_length(final_rtable))
subroot.parse->rtable = lappend(subroot.parse->rtable,
makeNode(RangeTblEntry));
...
}

4. select为什么没有问题？

select和update走的是不同的分支，没有走上面的inheritance_planner()函数，内存消耗只有7M。

1000个分区的select
sql:

点击(此处)折叠或打开

db1000=# select * from maintb where id =3;
id | name
----+------------
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
3 | aaaaa12345
(19 rows)

生成执行计划消耗的内存:7M

点击(此处)折叠或打开

MessageContext: 8192 total in 1 blocks; 1864 free (1 chunks); 6328 used
MessageContext: 8388608 total in 11 blocks; 1176776 free (6 chunks); 7211832 used

update的调用栈：
点击(此处)折叠或打开

(gdb) bt
#0 inheritance_planner (root=0x1fd6548) at planner.c:787
#1 0x00000000005fe7c0 in subquery_planner (glob=0x1fd64b8, parse=0x1fd59e0, parent_root=<value optimized out>,
hasRecursion=<value optimized out>, tuple_fraction=0, subroot=0x7fffbf4c70b8) at planner.c:569
#2 0x00000000005fe935 in standard_planner (parse=0x1fd59e0, cursorOptions=0, boundParams=0x0) at planner.c:210
#3 0x00000000006784fa in pg_plan_query (querytree=<value optimized out>, cursorOptions=<value optimized out>,
boundParams=<value optimized out>) at postgres.c:750
#4 0x00000000006785e4 in pg_plan_queries (querytrees=<value optimized out>, cursorOptions=0, boundParams=0x0)
at postgres.c:809
#5 0x0000000000678bcc in exec_simple_query (
query_string=0x1fd4b30 "update maintb set name='aaaaa12345' where id =3;") at postgres.c:974
#6 0x000000000067a179 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>,
dbname=0x1f70fb0 "db1000", username=<value optimized out>) at postgres.c:4016
#7 0x00000000006283cb in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4123
#8 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3797
#9 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1576
#10 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1223
#11 0x00000000005c0cd8 in main (argc=3, argv=0x1f6fa10) at main.c:227

select的调用栈:

点击(此处)折叠或打开

(gdb) bt
#0 grouping_planner (root=0x1fd6568, tuple_fraction=0) at planner.c:1080
#1 0x00000000005fe5b4 in subquery_planner (glob=0x1fd64d8, parse=0x1fd5b20, parent_root=<value optimized out>,
hasRecursion=<value optimized out>, tuple_fraction=0, subroot=0x7fffbf4c70b8) at planner.c:572
#2 0x00000000005fe935 in standard_planner (parse=0x1fd5b20, cursorOptions=0, boundParams=0x0) at planner.c:210
#3 0x00000000006784fa in pg_plan_query (querytree=<value optimized out>, cursorOptions=<value optimized out>,
boundParams=<value optimized out>) at postgres.c:750
#4 0x00000000006785e4 in pg_plan_queries (querytrees=<value optimized out>, cursorOptions=0, boundParams=0x0)
at postgres.c:809
#5 0x0000000000678bcc in exec_simple_query (query_string=0x1fd4b30 "select * from maintb where id =3;")
at postgres.c:974
#6 0x000000000067a179 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>,
dbname=0x1f70fb0 "db1000", username=<value optimized out>) at postgres.c:4016
#7 0x00000000006283cb in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4123
#8 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3797
#9 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1576
#10 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1223
#11 0x00000000005c0cd8 in main (argc=3, argv=0x1f6fa10) at main.c:227

5.最后

按目前PG对分区表的处理逻辑，确实处理不了很多分区的情况。（但是update的那2处内存分配应该是可以优化的）。

6.参考

http://blog.chinaunix.net/xmlrpc.php?r=blog/article&uid=20726500&id=4732253

阅读(5319) | 评论(1) | 转发(1) |

上一篇：PostgreSQL中的区域和编码

下一篇：高并发pgbench测试报错:Resource temporarily unavailable

给主人留下些什么吧！~~

skykiker2015-06-22 11:17:51

这个问题由于产生了LevelDown，已经引起了社区的重视，几位大神已经提出了优化的Patch。详见 http://www.postgresql.org/message-id/CAEepm=1-3vRARS-jWiH-VZShXGSFBZzw=-WhE2Ui4msZxDWLyw@mail.gmail.com

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6