全部博文(409)
分类:
2012-07-19 13:14:09
原文地址:AIX双机热备实战操作 作者:fengzhanhai
# smit vg
Add a Volume Group with Data Path Devices
VOLUME GROUP name [datavg] + Physical partition SIZE in megabytes 256 * PHYSICAL VOLUME names [vpath0 vpath1 vpath2 vpath3 vpath4 vpath5 vpath6 vpath7 vpath8 vpath9 vpath10 vpath11 vpath12 vpath13 vpath14 vpath15 vpath16 vpath17 vpath18 vpath19 vpath20 vpath21 vpath22 vpath23 vpath24 vpath25 vpath26 vpath27 vpath28 vpath29 vpath30] Force the creation of a volume group no + Activate volume group AUTOMATICALLY no + at system restart? Volume Group MAJOR NUMBER [61] Create VG Concurrent Capable? yes Auto-varyon in Concurrent Mode? no Create a big VG format Volume Group? yes Create a scalable VG format Volume Group? no |
*big vg是IBM为Oracle等数据库特定的一种VG。
1.1.2 主机A导出datavg信息给主机B#smit vg
Export a Volume Group
* VOLUME GROUP name [datavg]
导出后本节点看不到datavg。
1.1.3 主机A导入datavg信息#smit vg
Import a Volume Group
VOLUME GROUP name [datavg]
*PHYSICAL VOLUME name [vpath0]
Volume Group MAJOR NUMBER [61]
1.1.4 主机B导入datavg信息
# smitty vg
Import a Volume Group
Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields]
VOLUME GROUP name [datavg] * PHYSICAL VOLUME name [vpath0] + Volume Group MAJOR NUMBER [61] +# |
两台主机的VG major number要保持一致!
1.1.5 查看VG状态创建成功后,查看主机A和主机B的pv:
# lspv
hdisk0 00f64acac2b7e530 rootvg active
hdisk1 00f64aca6b8e4066 rootvg active
vpath0 00f6391f42b893a7 datavg
vpath1 00f6391f42b89562 datavg
vpath2 00f64aca59fb8bd1 datavg
vpath3 00f64aca5f5354fb datavg
vpath4 00f64aca5f53567c datavg
vpath5 00f64aca5f5357db datavg
vpath6 00f64aca5f535988 datavg
vpath7 00f64aca5f535afb datavg
vpath8 00f64aca5f535c50 datavg
vpath9 00f64aca5f535da0 datavg
vpath10 00f64aca5f535ef7 datavg
vpath11 00f64aca5f53606f datavg
vpath12 00f64aca5f5361d6 datavg
vpath13 00f64aca5f53632a datavg
vpath14 00f64aca5f536484 datavg
vpath15 00f64aca5f5365d9 datavg
vpath16 00f64aca5f53672d datavg
vpath17 00f64aca5f536919 datavg
vpath18 00f64aca5f536a86 datavg
vpath19 00f64aca5f536c28 datavg
vpath20 00f64aca5f536dad datavg
vpath21 00f64aca5f536f08 datavg
vpath22 00f64aca5f5370cf datavg
vpath23 00f64aca5f537253 datavg
vpath24 00f64aca5f5373d9 datavg
vpath25 00f64aca5f537542 datavg
vpath26 00f64aca5f5376da datavg
vpath27 00f64aca5f537873 datavg
vpath28 00f64aca5f5379d3 datavg
vpath29 00f64aca5f537b34 datavg
vpath30 00f64aca5f537cd0 datavg
vpath31 00f64aca5f59c330 None
vpath32 00f64aca5f59c4d4 None
vpath33 00f64aca792ecee2 None
vpath34 00f64aca792ed073 None
vpath35 00f6391f5afa6ccf None
vpath36 none None
vpath37 none None
vpath38 none None
vpath39 00f6391f76699d1f None
1.2 心跳线配置及测试在选择哪个串口作为心跳线的端口时,原则是这样:
在串口富裕的情况下,尽量不要选择第1个和第2个。(因为第一个串口一般做console,第二个常用作远程维护程序使用)。
如果不这样,虽然同步和简单的测试没有问题,但在实际应用中回出现HACMP经常自动切换,所以应该不使用第1和第2个串口。
1.2.1 心跳线配置#smitty tty
Add a TTY
TTY type tty TTY interface rs232 Description Asynchronous Terminal Parent adapter sa0 * PORT number [0] + Enable LOGIN disable + BAUD rate [9600] PARITY [none] + BITS per character [8] + Number of STOP BITS [1] + TIME before advancing to next port setting [0] + TERMINAL type [dumb] FLOW CONTROL to be used [xon] |
主机2步骤与主机1相同。
1.2.2 心跳线测试在两台主机上,通过lsdev –Cc tty查看新增的串行口,会有类似下面的内容:
tty0 Available 00-00-S3-00 Asynchronous Terminal |
序号 |
主机 |
内容 |
1. |
主机1 |
stty |
2. |
主机2 |
stty 这时在两台主机的命令行下会有内容的显示,否则,tty配置失败。示例如下: speed 9600 baud; -parity hupcl eol2 = ^? brkint -inpck -istrip icrnl -ixany ixoff onlcr tab3 echo echoe echok |
3. |
主机1 |
cat /etc/hosts > /dev/tty0 |
4. |
主机2 |
cat < /dev/tty0 这时在主机2的命令行下有主机1的/etc/hosts文件的内容,否则,tty配置失败。 |
2. 安装HACMP 2.1 安装HACMP5.4
将HACMP V5.4 软件CD放入CD-ROM,在主控台上执行以下操作:
#smitty installp
Install Software
INPUT device / directory for software [/dev/cd0]
SOFTWARE to install [不选择cluster.haview,
cluster.hativoli]
ACCEPT new license agreements? yes
2.2 升级到HACMP 5.4.1由于Oracle 10g RAC需要HACMP版本为5.4.1,可以从IBM官方网站下载5.4.1的最新补丁。
#smitty installp
Update Installed Software to Latest Level (Update All)
ACCEPT new license agreements? yes
2.3 查看HACMP版本#lslpp -l |grep cluster
cluster.adt.es.client.include
cluster.adt.es.client.samples.clinfo
cluster.adt.es.client.samples.clstat
cluster.adt.es.client.samples.libcl
cluster.adt.es.java.demo.monitor
cluster.doc.en_US.es.html 5.4.1.0 COMMITTED HAES Web-based HTML
cluster.doc.en_US.es.pdf 5.4.1.0 COMMITTED HAES PDF Documentation - U.S.
cluster.es.cfs.rte 5.4.1.6 COMMITTED ES Cluster File System Support
cluster.es.client.lib 5.4.1.7 COMMITTED ES Client Libraries
cluster.es.client.rte 5.4.1.10 COMMITTED ES Client Runtime
cluster.es.client.utils 5.4.1.9 COMMITTED ES Client Utilities
cluster.es.client.wsm 5.4.1.7 COMMITTED Web based Smit
cluster.es.cspoc.cmds 5.4.1.11 COMMITTED ES CSPOC Commands
cluster.es.cspoc.dsh 5.4.1.0 COMMITTED ES CSPOC dsh
cluster.es.cspoc.rte 5.4.1.6 COMMITTED ES CSPOC Runtime Commands
cluster.es.plugins.dhcp 5.4.1.0 COMMITTED ES Plugins - dhcp
cluster.es.plugins.dns 5.4.1.0 COMMITTED ES Plugins - Name Server
cluster.es.plugins.printserver
cluster.es.server.cfgast 5.4.1.0 COMMITTED ES Two-Node Configuration
cluster.es.server.diag 5.4.1.11 COMMITTED ES Server Diags
cluster.es.server.events 5.4.1.11 COMMITTED ES Server Events
cluster.es.server.rte 5.4.1.11 COMMITTED ES Base Server Runtime
cluster.es.server.testtool
cluster.es.server.utils 5.4.1.11 COMMITTED ES Server Utilities
cluster.es.worksheets 5.4.1.5 COMMITTED Online Planning Worksheets
cluster.license 5.4.1.1 COMMITTED HACMP Electronic License
cluster.msg.en_US.cspoc 5.4.1.0 COMMITTED HACMP CSPOC Messages - U.S.
cluster.msg.en_US.es.client
cluster.msg.en_US.es.server
cluster.es.client.lib 5.4.1.7 COMMITTED ES Client Libraries
cluster.es.client.rte 5.4.1.10 COMMITTED ES Client Runtime
cluster.es.cspoc.rte 5.4.0.0 COMMITTED ES CSPOC Runtime Commands
cluster.es.server.diag 5.4.0.0 COMMITTED ES Server Diags
cluster.es.server.events 5.4.0.0 COMMITTED ES Server Events
cluster.es.server.rte 5.4.1.11 COMMITTED ES Base Server Runtime
cluster.es.server.utils 5.4.1.11 COMMITTED ES Server Utilities
cluster.man.en_US.es.data 5.4.1.6 COMMITTED ES Man Pages - U.S. English
3. OA数据库HACMP配置
以下操作只需在一个节点上操作即可。
# smitty hacmp
Extended Configuration
Extended Topology Configuration
Configure an HACMP Cluster
Add/Change/Show an HACMP Cluster
* Cluster Name [oadb_cl]
节点# smitty hacmp
Extended Configuration
Extended Topology Configuration
Configure HACMP Nodes
Add a Node to the HACMP Cluste
*NodeName [主机A]
Communication Path to Node [主机A] +
以同样方法将主机B添加到Cluster中
*NodeName [主机B]
Communication Path to Node [主机B] +
网络 网络# smitty hacmp
Extended Configuration
Extended Topology Configuration
Configure HACMP Networks
Add a Network to the HACMP Cluster
* NetworkName [net_ether_01]
* NetworkType ether
* Netmask [255.255.255.0] +
* Enable IP Address Takeover via IP Aliases [No] +
IP Address Offset for Heartbeating over IP Aliases []
* Enable IP Address Takeover via IP Aliases [Yes] 此选项决定了HACMP的IP切换方式,但值得一提的是只有“boot”、“standby”、“service”三个IP分别为三个不同网段时必须选用IP Aliases方式。如果“boot”、“standby”其中一个与“service”为同一个网段时必须选用IP Replace,则此选项应选“NO”。
# smitty hacmp
Extended Configuration
Extended Topology Configuration
Configure HACMP Networks
Add a Network to the HACMP Cluster
* NetworkName [net_rs232_01]
* Network Type rs232
通信接口
# smitty hacmp
Extended Configuration
Extended Topology Configuration
Configure HACMP Networks
Configure HACMP Communication Interfaces/Devices
Add Communication Interfaces/Device
Add Pre-defined Communication Interfaces and Devices
Communication Interfaces
* IP Label/Address [主机A] +
* Network Type ether
* Network Name net_ether_01
* Node Name [主机A] +
Network Interface []
同样方法添加另一个通信接口到”net_ether_01”
* IP Label/Address [主机B] +
* Network Type ether
* Network Name net_ether_01
* Node Name [主机B] +
Network Interface []
同样方法创建另一个通信接口
* IP Label/Address [主机A_priv] +
* Network Type ether
* Network Name net_ether_02
* Node Name [主机A] +
Network Interface []
* IP Label/Address [主机B_ priv] +
* Network Type ether
* Network Name net_ether_02
* Node Name [主机B] +
Network Interface []
# smitty hacmp
Extended Configuration
Extended Topology Configuration
Configure HACMP Networks
Configure HACMP Communication Interfaces/Devices
Add Communication Interfaces/Device
Add Pre-defined Communication Interfaces and Devices
Communication Devices
* Device Name [osdb1_tty0]
* Network Type rs232
* Network Name net_rs232_01
* Device Path [/dev/tty0]
* Node Name [主机A] +
以同样方法创建另一个串口通信设备
* Device Name [osdb2_tty0]
* Network Type rs232
* Network Name net_rs232_01
* Device Path [/dev/tty0]
* Node Name [主机B] +
资源 Service IP# smitty hacmp
Extended Configuration
Extended Resource Configuration
HACMP Extended Resources Configuration
Configure HACMP Service IP Labels/Addresses
Add a Service IP Label/Address
Configurable on Multiple Nodes
* IP Label/Address 主机A_svc
* Network Name net_ether_01
同样方法配置第二个Service IP
* IP Label/Address 主机B_svc
* Network Name net_ether_01
此处配置Service IP仅仅为了让HACMP区分Boot IP和Service IP,不配置Verify时会报错
3.6.2 创建资源组# smitty hacmp
Extended Configuration
Extended Resource Configuration
HACMP Extended Resource Group Configuration
Add a Resource Group
* Resource Group Name [xckyres]
Participating Nodes (Default Node Priority)[主机A 主机B]
Startup Policy Online On All Available Nodes > +
Fallover Policy Bring Offline > +
Fallback Policy Never Fallback > +
“Participating Nodes”选项决定了资源组中节点的优先级,写在前面的节点其优先级高于后面的节点。
根据实际需求决定资源组的:“Startup Policy”、“Fallover Policy”、“Fallback Policy”。
Startup Policy:
n Online On Home Node Only: 只在主节点启动。在 “Participating Nodes”中写在第一位的节点启动。
n Online On First Available Node:在第一个启动的节点启动。在“Participating Nodes”中所选择的所有节点中最先启动的节点上启动。
n Online Using Distribution Policy:按照分布策略启动
n Online On All Available Nodes:在所有启动的节点中启动。做并发群集时选择此项。如:Oracle RAC
Fallover Policy:
n Fallover To Next Priority Node In The List:节点失败时资源组迁移到下一优先级节点
n Fallover Using Dynamic Node Priority:节点失败时动态选择迁移节点
n Bring Offline (On Error Node Only):将资源组下线
Fallback Policy:
n Fallback To Higher Priority Node In The List:节点恢复时资源组返回优先级高的节点
n Never Fallback:不进行资源组回迁
# smitty hacmp
Extended Configuration
Extended Resource Configuration
Change/Show Resources and Attributes for a Resource Group
Resource Group Name oadbcon_rg
Participating Nodes (Default Node Priority) 主机A 主机B
Startup Policy Online On All Available Nodes
Fallover Policy Bring Offline (On Error Node Only)
Fallback Policy Never Fallback
Concurrent Volume Groups [datavg] +
Use forced varyon of volume groups, if necessary false +
Automatically Import Volume Groups false +
Application Servers [] +
Tape Resources [] +
Raw Disk PVIDs [] +
Disk Fencing Activated false +
Fast Connect Services [] +
Communication Links [] +
Workload Manager Class [] +
Miscellaneous Data []
配置
# smitty hacmp
Extended Configuration
Extended Verification and Synchronization
* Verify, Synchronize or Both [Both] +
* Automatically correct errors found during [Interactively] +
verification?
* Force synchronization if verification fails? [No] +
* Verify changes only? [No] +
* Logging [Standard] +
停止HACMP (快速路径:#smitty clstart)# smitty hacmp
System Management (C-SPOC)
Manage HACMP Services
Start Cluster Services
* Start now, on system restart or both now
Start Cluster Services on these nodes [主机A 主机B]
* Manage Resource Groups Automatically
BROADCAST message at startup? true
Startup Cluster Information Daemon? false
Ignore verification errors? false
Automatically correct errors found during Interactively
cluster start?
(快速路径:#smitty clstop)
# smitty hacmp
System Management (C-SPOC)
Manage HACMP Services
Stop Cluster Services
* Stop now, on system restart or both now+
Stop Cluster Services on these nodes [主机A 主机B]
BROADCAST cluster shutdown? true+
4.1 HACMP同步失败报错信息如下:
ERROR: Verification of Cluster Topology for RSCT failed. See “/var/ha/log/topsvcs.default” for detailed information. WARNING: File ‘netmon.cf’ is missing or empty on node gdapp1. This file is needed for a cluster with the single-adapter network net_rs232_01. Please create ‘netmon.cf’ file on node gdapp1 as described in ‘HACMP Planning and Installation Guide’. WARNING: File ‘netmon.cf’ is missing or empty on node gdapp2. This file is needed for a cluster with the single-adapter network net_rs232_01. Please create ‘netmon.cf’ file on node gdapp2 as described in ‘HACMP Planning and Installation Guide’. |
解决方法:
到两个节点的/usr/sbin/cluster目录下检查是否有netmon.cf文件,如果没有,请增加该文件,并增加内容;如果有,请检查内容,内容如下:
10.19.98.18 10.19.98.22 10.19.98.17 10.19.98.19 192.168.254.3 192.168.254.4 |
以root用户执行
#ps -ef|grep ora
保证没有oracle进程运行
查看 hacmp进程
lssrc -g cluster
可用 stopsrc -s service_name来停HACMP的子系统,例如:
#stopsrc -s clinfo