aix优化记录-wizardzj-ChinaUnix博客

风月martin.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

wizardzj

博客访问： 1467277
博文数量： 77
博客积分： 2104
博客等级：大尉
技术积分： 2322
用户组：普通用户
注册时间： 2008-03-19 13:21

个人简介

关注于系统高可用、网站架构

文章分类

全部博文（77）

应用服务（14）

jvm（3）

tomcat（1）

jboss（1）

apache（2）

nginx（7）
大数据（2）
邮件系统（0）
配置管理（3）
数据库（11）

mongodb（2）

mysql（7）

oracle（2）
硬件（3）
网络（2）
监控（3）
缓存（3）
dns（1）
aix（2）
安全（1）
python（3）
shell（5）
ldap（1）
虚拟化（3）
lvs（1）
HA（3）
linux（16）
未分配的博文（0）

文章存档

2018年（1）

2017年（1）

2015年（4）

2014年（8）

2013年（4）

2012年（12）

2011年（17）

2010年（30）

我的朋友

相关博文

aix优化记录

分类：系统运维

2011-10-28 15:00:00

内存状态

#svmon -G

前15个占用大内存的进程
SJYD_SJK_1:/#svmon -Pt15 | perl -e 'while(<>){print if($.==2||$&&&!$s++);$.=0 if(/^-+$/)}'

DMS避免优化

DMS（deadman switch)是用来描述系统kernel extension用的，它可以在系统崩溃前down掉系统，并产生dump文件，以供日后检查。集群中为了正确处理节点失败，需要判断节点是否死掉。这期间deadman switch使用失败探测参数设置的相关参数进行判断，如果i/o memory等有问题都可能使集群管理器不能正常处理节点通讯，而错误地使集群节点死掉

DMS 的起因：

DMS起作用的原因主要有以下几点：

a. 某种应用程序的优先级大于clstrmgr deamon , 导致clstrmgr无法正常reset DMS计数器。

b. 在系统上存在大量IO 操作，导致cpu 没有时间相应clstrmgr deamon .

c. 内存泄漏或溢出问题

d. 大量的系统错误日志活动，如：（token-ring beaconing 问题）

优化调整：

1）调整系统的io pacing 高低水印

官方推荐值：

HIGH water mark for pending write I/Os per file [33]

LOW water mark for pending write I/Os per file [24]

现系统值：

HIGH water mark for pending write I/Os per file [8193]

LOW water mark for pending write I/Os per file [4096]

2）调快cpu同步syncd频率，(系统默认６０秒）

可见当前系统ha没有优化此频率。加快同步的频率，降低同步的IO量。

现系统值：60s

官方推荐值：10s

3）减慢ha心跳线诊断频率FDR（系统默认 normal）

当系统有大io量，或者内存不够情况下，无法响应ha心跳，那么心跳检测的频率越快，就会加速节点预告死亡。

现系统值：normal

推荐优化值：slow

网络性能优化

当前网络参数值：

udp_sendspace = 65536

udp_recvspace = 262144

tcp_recvspace = 262144

tcp_sendspace = 262144

sb_max = 1048576

ipqmaxlen = 100

官方建议udp_sendspace = 65536已足够，但是 udp_recvspace推荐为udp_sendspace的10倍。

sb_max = 1048576

因此需要修改主机网络参数

no -p -o udp_sendspace =655360 动态修改重启下inted 进程就可以。

lun磁盘的锁定机制reserve_lock。

现二台主机的powerdisk， reserve_lock都是yes。同事看到主机内disk运行报错不断。

怀疑当时安装rac的时候，只是从网上下载的文档，并没有看官方文档，害人不浅。
oracle官方文档说过：在HACMP+RAC环境中，PV的这个属性reserve_lock(reserve_policy)必须为否，以提供多节点的并发访问；这个案例业内不知道有太多例子了，如果不设置后果会不可预计。摘取官方原话：

To enable simultaneous access to a disk device from multiple nodes, you must set the appropriate Object Data Manager (ODM) attribute listed in the following table to the value shown, depending on the disk type:
Disk Type Attribute Value
SSA, FAStT, or non-MPIO-capable disks reserve_lock no
ESS, EMC, HDS, CLARiiON, or MPIO-capable disks
reserve_policy no_reserve
To determine whether the attribute has the correct value, enter a command similar to the following on all cluster nodes for each disk device that you want to use:
# /usr/sbin/lsattr -E -l hdiskn
If the required attribute is not set to the correct value on any node, then enter a command similar to one of the following on that node:
■ SSA and FAStT devices
# /usr/sbin/chdev -l hdiskn -a reserve_lock=no
■ ESS, EMC, HDS, CLARiiON, and MPIO-capable devices
# /usr/sbin/chdev -l hdiskn -a reserve_policy=no_reserve

阅读(2732) | 评论(0) | 转发(0) |

上一篇：sort命令用法

下一篇：shell调用技巧

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6