Chinaunix首页 | 论坛 | 博客
  • 博客访问: 17899
  • 博文数量: 4
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 77
  • 用 户 组: 普通用户
  • 注册时间: 2013-06-07 10:15
文章分类
文章存档

2013年(4)

我的朋友

分类: LINUX

2013-06-08 10:31:57

什么是newstart HA?有什么作用?如何搭建?如何使用?当我们接触到新的知识时,会带有一系列的疑问,下面我们带着疑问共同探索一番。

HA,全称High Availability(即高可用性),而newstart HA,作为一款实现高可用性的双机集群软件,用于保证业务持续性运行,在大多数对业务持续性运行(N*24小时)要求比较高的企业,如通信行业的企业,经常会用到。在简单了解一些概念及其作用后,下面详细讲解如何在linux下双机集群搭建和使用。

 

一、 准备工作

工欲善其事必先利其器,要在linux系统下高效地搭建及使用newstarth HA,前期工作要准备好。

1、 一些概念:

l       节点:指运行高可用双机集群软件中的计算机。

l       工作链路(work link):指集群向外提供服务的链路,从服务器到交换机的链路。

l       心跳链路(heartbeat link):维持高可用集群软件内部互联,传送心跳信息的链路。

l       服务(service):是与用户应用相关的一组资源的集合,一般包括:管理用户进程资源的应用脚本(application),网络资源,存储资源;譬如说用户的一个 Oracle数据库,该服务包括管理Oracle的脚本(用于启动,关闭和监控) IP地址和所需要 mount的磁盘;服务可以是其中几种或全部资源的组合。

 

2、 硬件(两台物理机子,以下信息相同):

l       三张网卡:两张网卡做bonding(工作链路),一张网卡做心路链路(要保证心跳链路总数不少于2条)

l       串口:组串口心跳链路,加上上面网口心跳链路,达到2

l       磁阵:存放共享数据,建议从中划分一个30~50M的分区用于组建仲裁盘(保障数据安全性的一种机制,可选但推荐,这里为/dev/sdb1

 

3、 软件:

l       操作系统sles11,主流平台都可支持,如sles9/10/11,redhat5/6,cgslv3/4等

l       HA版本3.0.1.07,已从newstart官网获取,目前是最新的。

l       数据库,oracle10g

l       中间件:tomcat6.0

PS:上述操作系统,数据库及中间件安装、配置和调试过程这里不详列,网上相关参考资料很多;在开始下面操作之前,所有业务在两台服务器都已调试过,各自运行都是正常的,接下来看看newstartha的安装。

 

二、 安装NewStart HA

网上下载的安装程序是iso文件,使用用二进制(bin)传输方式上传服务器home目录,并挂载到/mnt目录:

# mount -o loop  /home/xxxx.iso /mnt

 

安装过程:

执行安装脚本,开始安装,选择3,安装所有组件(主程序+命令行管理工具+web管理工具):

# /mnt/install

HA Version:

       1)New Version:3.0.1.07

       2)Cancel

 

please select Version [1-2]?1

                NewStart HA Installation Program

                Version: 3.0.1.07

                Support email:  ha-support@gd-linux.com

 

        1)NewStart HA Server Program and CLI Administrative Tool

        2)Web-based Administrative Tool (options)(version: 20121101)

        3)All components

        4)Cancel

 

select the components to be installed [1-4]? 3

Checking NewStart HA ...                NOT running

 

Installing ...

Installing the

 /mnt/nsha/x86/sles9/newstartha-3.0.1.07-20130107.i586.rpm ...

Preparing...    ########################################### [100%]

1:newstartha    ########################################### [100%]

newstartha      0:off  1:off  2:off  3:on   4:off  5:on   6:off

Installing liblvm2clusterlock.so ok.

输入产品许可号(下面为试用SN

please enter the SN: 00TB24-FC0TCF-629A1H-B00D46

 

Make /etc/ha.d/lic/newstartha.key succeeded.

                                                                        [OK]

 

web-based administrative tool install, deploying, please wait...

jdk installed ok!

tomcat installed ok!

web-based administrative tool installed ok!

 

Create keys(/usr/lib/newstartha/keystore.exp 1), please wait...

Create tomcat.keystore OK.

 

Do you want to start web-based administrative tool automatically as a system service? y(es) or n(o)? y 系统启动时是否自动启动Web管理工具

 

Starting Web-based Administrative Tool Service ...

[OK]

Please remember to change the default web password immediately!

 

The component(s) is installed completely.

HA程序安装完成,另一个服务器执行上述操作,两台服务器操作完成后往下看。

 

申请license

安装完成后进行license的申请,HA启动时会验证keylicense文件有效性,否则无法启动,操作方法:

1、  把两台服务器上的/etc/ha.d/lic/newstartha.key文件打包(名字区分好,如newstartha.key_node1/2二进制(bin)方式下载,然后发送到邮箱: 进行license文件的申请。

2、  收到的license文件后改名为newstartha.lic,用二进制(bin)方式上传到服务器,放到/etc/ha.d/lic/目录下。

 

编写管控业务的HA脚本(oracletomcat

HA脚本是规定如何启动、停止、强制停止和检测业务程序newstart HA提供主流应用的脚本模版供参考,如apachetomcatoracle等,位于/etc/ha.d/resource.d目录下,模版格式为:xxxx_example.ps

编写oracletomcatHA脚本:进入上述目录,复制oracle_example.pstomcat_example.sh模版,分别重命名为oracle.pstomcat.ps,接着拷贝到/home/script/目录下,最后根据实际环境编缉两个脚本开头几个变量值就可以,如下:

#vi /home/script/oracle.ps

#The following three variant should be set to proper value

ORACLE_HOME="/home/oracle_home"

ORACLE_SID="orcl"

ALERTLOG="${ORACLE_HOME}/admin/${ORACLE_SID}/bdump/alert_${ORACLE_SID}.log"

 

#vi /home/script/tomcat.ps

#The following variants should be set correctly

PORT=80                                 # tomcat listen port

BINPWD=/opt/NewStartHA/web/tomcat/bin   # tomcat bin path

三、配置NewStart HA

整个配置过程分两步,集群初始化和服务初始化,必须按以上顺序进行操作。HA支持命令行(cli)及web两种管理工具进行配置,下面是cli工具的配置过程。

 

配置之前确认以下信息:

1.     两台服务器的主机名称;

2.     心跳和工作链路的网卡名对应并且相同,并配置好所有网卡的固定IP

3.     确定访问oracle/tomcat的浮动IP

4.     HA脚本位置;(/home/script/oracle.pstomcat.ps

5.     清楚磁阵挂载目录;(安装oracle时已建好,这里为/home/db

6.     第三方IP列表:可选,建议配置35IP,这些IP与工作网卡属于相同网段,注意不要配成两台服务器的IP,其作为是检测自身网络正常与否。

 

集群初始化,格式:cluster-init

命令行下运行cli指令,进入cli管理工具,然后运行cluster-init。在开始之前再啰嗦一下,接下来的整个集群配置过程中,粗体表示根据实际环境填写的值,斜粗体表示说明(其中回车表示推荐配置)。

cli:~>cluster-init

 

======================================

    Cluster Initialization Utility   

======================================

 

This utility sets up the initialization information of a 2-node cluster.

It prompts you for the following information:

        - Hostname

        - Information about the heartbeat channels

        - How long between heartbeat

        - How long to declare heartbeat fails

        - Watchdog configuration

        - Lock disk configuration

 

Please input cluster name:cluster_ora   自定义集群名称

Input the first  node name and IP:suse11-1 192.168.1.92

Input the second node name and IP:suse11-2 192.168.1.93 

How long between heartbeats(in seconds)[1]:直接回车

How long to declare heartbeat has broken(in seconds)[60]: 直接回车

Do you want to enable watchdog device ? (yes/no)[no]: 直接回车

Please choose multicast heartbeat channel:

        0) eth0

        1) bond0

Select a multicast heartbeat channel [0, 1]:0

Another multicast heartbeat channel? (yes/no)[yes]:no

Do you want to add a serial heartbeat channel? (yes/no)[yes]: 直接回车

Input serial heartbeat channel[/dev/ttyS0]: 直接回车

Another serial heartbeat channel? (yes/no)[yes]:no

Do you want to enable worklink_hb ? (yes/no)[yes]: 直接回车

Do you want to add third-party ip list ? [recommended 3-5 ip]  (yes/no)[yes]: 直接回车

Please input a third-party ip address:192.168.1.19

Another thirdpart ip address? (yes/no)[yes]: 直接回车

Please input a third-party ip address:192.168.1.20

Another thirdpart ip address? (yes/no)[yes]: 直接回车

Please input a third-party ip address:192.168.1.21

Another thirdpart ip address? (yes/no)[yes]:no

Do you want to add a lock disk(recommend) ? (yes/no)[yes]: 直接回车

Please input the partition name (/dev/sdb):/dev/sdb1仲裁盘

 

Warning:All data in /dev/sdb1 will be destroyed, sure to format it? (yes/no)[no]:yes

Do you want to enable kernel panic ? (yes/no)[no]: 直接回车

Please run service-init to initialize you services. 

集群初始化完成,接下来进行服务初始化。

 

服务初始化,格式:service-init

这里配置两个服务,先配数据库oracle,然后配置tomcatcli管理工具中运行service-init,进行服务初始化。

cli:~>service-init

 

======================================

    Service Initialization Utility   

======================================

 

This utility sets up the initialization information of the service in the HA system.

It prompts you for the following information:

        - Service information

        - Application resource information

        - Public net work interface information

        - Floating IP address information.

        - Block Disk information

        - Mount information

        - Raw Disk information

 

Input service name:oracle   自定义服务名称:oracle

Is it enabled?(yes/no)[yes]:

Do you want to configure preferred node ? (yes/no)[no]:yes

Please choose preferred node:

        0) suse11-1

        1) suse11-2

Select a node: [0, 1]:0

Input start time out[60]: 直接回车

Input stop time out[120]: 直接回车

Input check interval[30]: 直接回车

Input check time out[60]: 直接回车

Input max error count[1]: 直接回车

Restart after check result is failed?(yes/no)[no]: 直接回车

Start service anyway when float IP exist?(yes/no)[no]: 直接回车

Do you want to add a application? (yes/no)[yes]: 直接回车

 

====== Application ======

Input name of application[oracle_app_0]: 直接回车

Input script of application

[/etc/ha.d/resource.d/oracle]:/home/script/oracle.ps  管控oracle脚本

Is resource critical?[yes]: 直接回车

Is resource enable?[yes]: 直接回车

Add another application? (yes/no)[no]: 直接回车

Do you want to add a pubnic? (yes/no)[yes]: 直接回车

 

====== PubNIC ======

Input PubNIC name[oracle_net_card_0]: 直接回车

Is resource critical?[yes]: 直接回车

Please choose network device:

        0) eth0

        1) bond0

Select a network device [0, 1]:1

Add another pubnic? (yes/no)[no]: 直接回车

 

====== IP ======

Input IP name[oracle_ip_0]: 直接回车

Input IP address:192.168.1.96    浮点/业务IP

Input netmask[255.255.255.0]:

PubNIC of service:

     0) oracle_net_card_0    suse11-1:bond0    suse11-2:bond0

Select a PubNIC: [0, 0]:0

Is resource critical?[yes]: 直接回车

Add another IP? (yes/no)[no]: 直接回车

Do you want to add a raw disk? (yes/no)[no]:   直接回车

Do you want to add a diskmount? (yes/no)[no]:yes

 

====== diskmount ======

Input diskmount name[oracle_diskmount_1]: 直接回车

Is resource critical?[yes]: 直接回车

Is resource enable?[yes]: 直接回车

        0) disk   普通的块设备

        1) nfs    nfs设备

        2) lvm    逻辑卷设备

        3) cancel

please choose a disk type? [0, 3]:0

Input block disk device[/dev/hda1]:/dev/sdb2   共享数据所在设备

Input mountpoint:/home/db    挂载目录

Input type of file system[ext3]: 直接回车

Input user[root]:oracle    挂载目录的操作用户

Input group[root]:oinstall   操作用户的群组

Input mode[755]: 直接回车

Input options[rw]: 直接回车

Input the quota of the device[90]: 直接回车

do you want to stop service when the disk is readonly?[yes]: 直接回车

Add another diskmount? (yes/no)[no]: 直接回车

Add another service? (yes/no)[no]: yes

Input service name:tomcat   自定义服务名称:tomcat

Is it enabled?(yes/no)[yes]:

Do you want to configure preferred node ? (yes/no)[no]:yes

Please choose preferred node:

        0) suse11-1

        1) suse11-2

Select a node: [0, 1]:1

Input start time out[60]: 直接回车

Input stop time out[120]: 直接回车

Input check interval[30]: 直接回车

Input check time out[60]: 直接回车

Input max error count[1]: 直接回车

Restart after check result is failed?(yes/no)[no]: 直接回车

Start service anyway when float IP exist?(yes/no)[no]: 直接回车

Do you want to add a application? (yes/no)[yes]: 直接回车

 

====== Application ======

Input name of application[tomcat_app_0]: 直接回车

Input script of application

[/etc/ha.d/resource.d/tomcat]:/home/script/tomcat.ps  管控tomcat脚本

Is resource critical?[yes]: 直接回车

Is resource enable?[yes]: 直接回车

Add another application? (yes/no)[no]: 直接回车

Do you want to add a pubnic? (yes/no)[yes]: 直接回车

 

====== PubNIC ======

Input PubNIC name[tomcat_net_card_0]: 直接回车

Is resource critical?[yes]: 直接回车

Please choose network device:

        0) eth0

        1) bond0

Select a network device [0, 1]:1

Add another pubnic? (yes/no)[no]: 直接回车

 

====== IP ======

Input IP name[oracle_ip_0]: 直接回车

Input IP address:192.168.1.97    浮点/业务IP

Input netmask[255.255.255.0]:

PubNIC of service:

     0) tomcat_net_card_0    suse11-1:bond0    suse11-2:bond0

Select a PubNIC: [0, 0]:0

Is resource critical?[yes]: 直接回车

Add another IP? (yes/no)[no]: 直接回车

Do you want to add a raw disk? (yes/no)[no]:   直接回车

Do you want to add a diskmount? (yes/no)[no]:   直接回车

Add another service? (yes/no)[no]:   直接回车

Please run cluster-start to start the HA system, 

or run cluster-restart to restart the HA system.

服务初始化完成,此时集群不要启动,保持原状态,具体原因接下来说到。

 

HA脚本检测

前面已编写完oracletomcat脚本,但实际环境中仍需验证现有脚本能否完全管控应用,为此,HA提供了check-script工具作为快捷验证方法。注意操作前确认集群是停止状态,查看方式:cluster-stat

cli:~>cluster-stat

The HA system is not running now.

 

cli:~>check-script

Current service:

        0) name: oracle

        1) name: tomcat

        2) cancel

Select a(n) service [0, 2]:0

Current Application:

        0) script: /home/script/oracle.ps

        1) cancel

Select a(n) Application [0, 1]:0

 

Begin to test resource script......

Start resource oracle.ps:                                       pass

Check resource oracle.ps when running:                          pass

Start resource oracle.ps when running:                          pass

Check resource oracle.ps when running:                          pass

Stop resource oracle.ps when running:                           pass

Check resource oracle.ps when stopped:                          pass

Stop resource oracle.ps when stopped:                           pass

Check resource oracle.ps when stopped:                          pass

Start resource oracle.ps:                                       pass

Forcedstop resource oracle.ps when running:                     pass

Check resource oracle.ps when stopped:                          pass

Forcedstop resource oracle.ps when stopped:                     pass

Check resource oracle.ps when stopped:                          pass

 

End to test resource

Oracle脚本检测通过,全pass,没问题

 

cli:~>check-script

Current service:

        0) name: oracle

        1) name: tomcat

        2) cancel

Select a(n) service [0, 2]:1

Current Application:

        0) script: /home/script/tomcat.ps

        1) cancel

Select a(n) Application [0, 1]:0

 

Begin to test resource script......

Start resource tomcat.ps:                                      pass

Check resource tomcat.ps when running:                         pass

Start resource tomcat.ps when running:                         pass

Check resource tomcat.ps when running:                         pass

Stop resource tomcat.ps when running:                          pass

Check resource tomcat.ps when stopped:                         pass

Stop resource tomcat.ps when stopped:                          pass

Check resource tomcat.ps when stopped:                         pass

Start resource tomcat.ps:                                      pass

Forcedstop resource tomcat.ps when running:                    pass

Check resource tomcat.ps when stopped:                         pass

Forcedstop resource tomcat.ps when stopped:                    pass

Check resource tomcat.ps when stopped:                         pass

 

End to test resource

 

tomcat脚本检测通过,全pass,没问题

 

四、集群启动及状态查询

1、启动集群:

进入cli,启动集群,指令:cluster-start

cli:~>cluster-start

[suse11-1]Starting High-Availability services:

Configuration file checked ok.

..done

 

Configuration file checked ok.

[suse11-2]Starting High-Availability services:

..done

 

 

2、集群状态查询:

集群状态包括节点、心跳链路,工作链路和服务状态。进入cli,输入指令:cluster-stat(动态周期性刷新)查看。

cli:~>cluster-stat

                Press Ctrl-C or 'Q' to exit

                Date: Fri Apr 26 09:45:13 2013

 

 Member                  status

 suse11-1                 UP

 suse11-2                 UP

 

 WorkLink                suse11-1            suse11-2           

 bond0                   ONLINE               ONLINE 

 

 HeartbeatLink         suse11-1            suse11-2          status 

 network               eth0                eth0              ONLINE

 serial               /dev/ttyS0           /dev/ttyS0        ONLINE

 LockDisk             /dev/sdb1            /dev/sdb1         ONLINE 

 

 ServiceName          suse11-1            suse11-2           Enable 

*oracle               running             stopped            YES

 tomcat               stoped              running            YES

 

状态图说明:节点(Member)状态都是UP(正常),工作链路(WorkLinkbond0都是ONLINE(正常),心跳链路(HeartbeatLink)都是ONLINE(正常),服务oracle现运行(running)在suse11- 1, 服务tomcat现运行(running)在suse11- 2节点。

五、集群测试

主要验证服务能否正常倒换,因为只有在此前提下才能保障当集群发生故障(如其中一台服务器挂掉,运行中服务突然停止等)时,服务能够接管,实现持续运行,下面是测试过程:

 

1、查看集群状态:

cli:~>cluster-stat

                Press Ctrl-C or 'Q' to exit

                Date: Fri Apr 26 11:45:13 2013

 

 Member                  status

 suse11-1                 UP

 suse11-2                 UP

 

 WorkLink                suse11-1            suse11-2           

 bond0                   ONLINE               ONLINE 

 

 HeartbeatLink         suse11-1            suse11-2          status 

 network               eth0                eth0              ONLINE

 serial               /dev/ttyS0           /dev/ttyS0        ONLINE

 LockDisk             /dev/sdb1            /dev/sdb1         ONLINE 

 

 ServiceName          suse11-1            suse11-2           Enable 

*oracle               running             stopped            YES

 tomcat               stoped              running            YES

 

服务oracle现运行在suse11- 1节点, tomcat运行在suse11- 2节点。

 

2、  服务倒换,指令:service-migrate

cli:~>service-migrate

Select service to migrate:

Current service:

        0) oracle

        1) tomcat

        2) cancel

Select a service [0, 2]:0   倒换服务oracle

Select the destination node:

Current node:

        0) suse11-2

        1) cancel

Select a node [0, 1]:0

Send message to migrate service oracle from suse11-1 to suse11-2.

cli:~>service-migrate

Select service to migrate:

Current service:

        0) oracle

        1) tomcat

        2) cancel

Select a service [0, 2]:1   倒换服务tomcat

Select the destination node:

Current node:

        0) suse11-1

        1) cancel

Select a node [0, 1]:0

Send message to migrate service tomcat from suse11-2 to suse11-1.

 

3、  查看服务倒换结果

cli:~>cluster-stat

                Press Ctrl-C or 'Q' to exit

                Date: Fri Apr 26 11:46:20 2013

 

 Member                  status

 suse11-1                 UP

 suse11-2                 UP

 

 WorkLink                suse11-1            suse11-2           

 bond0                   ONLINE               ONLINE 

 

 HeartbeatLink         suse11-1            suse11-2          status 

 network               eth0                eth0              ONLINE

 serial               /dev/ttyS0           /dev/ttyS0        ONLINE

 LockDisk             /dev/sdb1            /dev/sdb1         ONLINE 

 

 ServiceName          suse11-1            suse11-2           Enable 

oracle                stoped             running            YES

 *tomcat               running            stoped             YES

 

两个服务倒换成功,现oracle运行在suse11-2,tomcat运行在suse11-1。以上倒换操作在两台服务器上至少各执行一次,也建议模拟一些常见故障测试,如节点重启HA能否自动启动并加入集群,主机重启或者关机服务能否倒换到备机等。

 

到这里,Newstart HA的探索之旅已结束,Enjoy it.

阅读(4980) | 评论(0) | 转发(0) |
0

上一篇:Newstart HA进阶

下一篇:没有了

给主人留下些什么吧!~~