ceph ( requests are blocked ) 异常解决方法-u0402-ChinaUnix博客

u0402

首页　| 　博文目录　| 　关于我

u0402

博客访问： 832514
博文数量： 274
博客积分： 0
博客等级：民兵
技术积分： 862
用户组：普通用户
注册时间： 2015-10-24 15:31

个人简介

不合格的程序猿

文章分类

全部博文（274）

存储（24）
协议（1）

http（0）

tcp/ip（1）
设计模式（1）
算法（5）

加密算法（5）

排序算法（0）
虚拟化（1）

NFV（1）
office（0）

excel（0）

word（0）
linux开发工具（17）

io工具（1）

git（2）

svn（1）

ctags（0）

gdb（4）

gtags（0）

vim（1）
web服务器（7）

nginx（0）

apache（7）
语言c/c++/java（31）

java（0）

c++（3）

c语言（28）
vmware（2）
网络安全（23）

openssl（8）
Linux系统（108）

时间管理（0）

进程调度（0）

虚拟文件系统（3）

内存管理（2）
VPN（0）
脚本（37）

PHP（1）

Python（5）

shell（30）
SourceInsight（1）
ldap（16）
未分配的博文（0）

文章存档

2019年（3）

2018年（1）

2017年（4）

2016年（160）

2015年（106）

我的朋友

txgc_wm

相关博文

ceph ( requests are blocked ) 异常解决方法

分类：服务器与存储

2016-08-31 18:39:06

转自：http://blog.csdn.net/signmem/article/details/50546919

最近在执行 ceph 扩容
注: 如果有条件, 建议不要扩容, 直接创建新 POOL, 这样会避免很多异常与影响

扩容每天大约 2T 空间, 扩容过程中, pg 会对数据执行自动迁移, 但在迁移过程中会遇到下面异常错误

[root@hh-yun-puppet-129021 ~]# ceph -s cluster dc4f91c1-8792-4948-b68f-2fcea75f53b9
     health HEALTH_WARN 1 requests are blocked > 32 sec
     monmap e3: 5 mons at {hh-yun-ceph-cinder015-128055=240.30.128.55:6789/0,hh-yun-ceph-cinder017-128057=240.30.128.57:6789/0,hh-yun-ceph-cinder024-128074=240.30.128.74:6789/0,hh-yun-ceph-cinder025-128075=240.30.128.75:6789/0,hh-yun-ceph-cinder026-128076=240.30.128.76:6789/0}, election epoch 216, quorum 0,1,2,3,4 hh-yun-ceph-cinder015-128055,hh-yun-ceph-cinder017-128057,hh-yun-ceph-cinder024-128074,hh-yun-ceph-cinder025-128075,hh-yun-ceph-cinder026-128076 osdmap e97975: 190 osds: 190 up, 190 in pgmap v13666786: 20544 pgs, 2 pools, 77479 GB data, 19508 kobjects 228 TB used, 426 TB / 654 TB avail 20542 active+clean 2 active+clean+scrubbing+deep
  client io 47657 kB/s rd, 164 MB/s wr, 5406 op/s 

	
	
		1
	

	
		2
	

	
		3
	

	
		4
	

	
		5
	

	
		6
	

	
		7
	

	
		8
	

	
		9
	

	
		10
	

	
		11
	




	
	
		1
	

	
		2
	

	
		3
	

	
		4
	

	
		5
	

	
		6
	

	
		7
	

	
		8
	

	
		9
	

	
		10
	

	
		11

注意: 1 requests are blocked > 32 sec 有可能是在数据迁移过程中, 用户正在对该数据块进行访问, 但访问还没有完成, 数据就迁移到别的 OSD 中, 那么就会导致有请求被 block, 对用户也是有影响的

解决方法
寻找 block 的请求

[root@hh-yun-puppet-129021 ~]# ceph health detail HEALTH_WARN 1 requests are blocked > 32 sec; 1 osds have slow requests 1 ops are blocked > 33554.4 sec 1 ops are blocked > 33554.4 sec on osd.16 1 osds have slow requests

	
	
		1
	

	
		2
	

	
		3
	

	
		4
	

	
		5
	




	
	
		1
	

	
		2
	

	
		3
	

	
		4
	

	
		5

可以看到 osd.16 具有一个操作 block
解决方法
查询 osd 对应主机

[root@hh-yun-puppet-129021 ~]# ceph osd tree
# id weight type name  up/down reweight
-1 598 root default -2 40 host hh-yun-ceph-cinder015-128055 0   4      osd.0 up 1
1   4      osd.1 up 1
2   4      osd.2 up 1
3   4      osd.3 up 1
4   4      osd.4 up 1
5   4      osd.5 up 1
6   4      osd.6 up 1
7   4      osd.7 up 1
8   4      osd.8 up 1
9   4      osd.9 up 1
-3 40 host hh-yun-ceph-cinder016-128056 10  4      osd.10 up   1
11  4      osd.11 up   1
12  4      osd.12 up   1
13  4      osd.13 up   1
14  4      osd.14 up   1
15  4      osd.15 up   1
16  4      osd.16 up   1
17  4      osd.17 up   1
下面省略

	
	
		1
	

	
		2
	

	
		3
	

	
		4
	

	
		5
	

	
		6
	

	
		7
	

	
		8
	

	
		9
	

	
		10
	

	
		11
	

	
		12
	

	
		13
	

	
		14
	

	
		15
	

	
		16
	

	
		17
	

	
		18
	

	
		19
	

	
		20
	

	
		21
	

	
		22
	

	
		23
	

	
		24
	




	
	
		1
	

	
		2
	

	
		3
	

	
		4
	

	
		5
	

	
		6
	

	
		7
	

	
		8
	

	
		9
	

	
		10
	

	
		11
	

	
		12
	

	
		13
	

	
		14
	

	
		15
	

	
		16
	

	
		17
	

	
		18
	

	
		19
	

	
		20
	

	
		21
	

	
		22
	

	
		23
	

	
		24

重启 osd

[root@hh-yun-ceph-cinder016-128056 ~]# /etc/init.d/ceph stop osd.16 === osd.16 === Stopping Ceph osd.16 on hh-yun-ceph-cinder016-128056...kill 2799859...kill 2799859...done
[root@hh-yun-ceph-cinder016-128056 ~]# /etc/init.d/ceph start osd.16 === osd.16 === create-or-move updated item name 'osd.16' weight 3.64 at location {host=hh-yun-ceph-cinder016-128056,root=default} to crush map
Starting Ceph osd.16 on hh-yun-ceph-cinder016-128056...
Running as unit run-3126361.service.

	
	
		1
	

	
		2
	

	
		3
	

	
		4
	

	
		5
	

	
		6
	

	
		7
	

	
		8
	




	
	
		1
	

	
		2
	

	
		3
	

	
		4
	

	
		5
	

	
		6
	

	
		7
	

	
		8

系统会对该 osd 执行 recovery 操作, recovery 过程中, 会断开 block request, 那么这个 request 将会重新请求 mon 节点, 并重新获得新的 pg map, 得到最新的数据访问位置, 从而解决上述问题

参考恢复后的状态

[root@hh-yun-puppet-129021 ~]# ceph -s cluster dc4f91c1-8792-4948-b68f-2fcea75f53b9
     health HEALTH_OK monmap e3: 5 mons at {hh-yun-ceph-cinder015-128055=240.30.128.55:6789/0,hh-yun-ceph-cinder017-128057=240.30.128.57:6789/0,hh-yun-ceph-cinder024-128074=240.30.128.74:6789/0,hh-yun-ceph-cinder025-128075=240.30.128.75:6789/0,hh-yun-ceph-cinder026-128076=240.30.128.76:6789/0}, election epoch 216, quorum 0,1,2,3,4 hh-yun-ceph-cinder015-128055,hh-yun-ceph-cinder017-128057,hh-yun-ceph-cinder024-128074,hh-yun-ceph-cinder025-128075,hh-yun-ceph-cinder026-128076 osdmap e97981: 190 osds: 190 up, 190 in pgmap v13669826: 20544 pgs, 2 pools, 77488 GB data, 19510 kobjects 228 TB used, 426 TB / 654 TB avail 20541 active+clean 3 active+clean+scrubbing+deep
  client io 21801 kB/s rd, 66461 kB/s wr, 2328 op/s

阅读(4854) | 评论(0) | 转发(0) |

上一篇：ethtool命令详解

下一篇：ceph pg recovery参数限流值研究小结

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6