分类: LINUX
2009-12-05 14:44:31
MOSIX集群(一)–安装 目的: 集群节点内进程能根据负载情况自动迁移 用vmware安装一台rhel5(192.168.100.5) # 下载MOSIX和kernel代码,准备编译 # 解压到指定目录 [root@rhel5 ~]# tar xjvf MOSIX-2.24.2.2.tbz -C /usr/src/ [root@rhel5 ~]# tar xzvf linux-2.6.26.tar.gz -C /usr/src/ #进入源代码所在目录 [root@rhel5 ~]# cd /usr/src/ #由于other/patch-2.6.26的目标路径是linux-2.6.26.1,做个连接吧(可能是mosix没有为2.6.26单独写patch…,不过还是支持的) [root@rhel5 src]# ln -s linux-2.6.26/ ./linux-2.6.26.1 #给kernel打上mosix补丁 [root@rhel5 src]# patch -p0 < /usr/src/mosix-2.24.2.2/other/patch-2.6.26 #进入源代码目录,开始编译 [root@rhel5 src]# cd linux-2.6.26 #生成配置文件 [root@rhel5 linux-2.6.26]# make menuconfig #生成依赖关系 [root@rhel5 linux-2.6.26]# make dep #编译内核 [root@rhel5 linux-2.6.26]# make bzImage #编译内核模块 [root@rhel5 linux-2.6.26]# make modules #安装内核模块 [root@rhel5 linux-2.6.26]# make modules_install #安装内核 [root@rhel5 linux-2.6.26]# make install #进入mosix目录 [root@rhel5 mosix-2.24.2.2]# cd ../mosix-2.24.2.2 #安装mosix,一路回车,只用安装,记得把你常用级别的mosix服务打开就可以了.配置以后再说 [root@rhel5 mosix-2.24.2.2]# ./mosix.install 关机以后,用rhel5(192.168.100.5)克隆出slave(192.168.100.6) 安装完成 MOSIX-2.24.2.2/linux-2.6.26集群(二)–配置 将rhel5和slave开启,开机的时候,在grub界面按回车,然后选择2.6.26内核启动 slave启动以后,把ip地址,机器名改好(应为是由rhel5克隆得到的嘛) [reel5] #配置mosix [root@rhel5 ~]# mosconf MOSIX CONFIGURATION =================== If this is your cluster's file-server and you want to configure MOSIX for a set of nodes with a common root, please type their common root directory. Otherwise, if you want to configure the node that you are running on, just press <ENTER> :- What would you like to configure? ================================= 1. Which nodes are in this cluster (ESSENTIAL) 2. Authentication (ESSENTIAL) 3. Logical node numbering (recommended) 4. Queueing policies (recommended) 5. Freezing policies 6. Miscellaneous policies 7. Become part of a multi-cluster organizational Grid Configure what :- 1 There are no nodes in your cluster yet: ======================================= To add a new set of nodes to your cluster, type 'n'. To turn on advanced options, type '+'. For help, type 'h'. To save and exit, type 'q'. (to abandon all changes and exit, type 'Q') Option :- n <==添加节点 Adding new node(s) to the cluster: First host-name or IP address :- 192.168.100.5 <==节点ip Number of nodes :- 1 <==节点数 Nodes in your cluster: ====================== 1. 192.168.100.5 To add a new set of nodes to your cluster, type 'n'. To modify an entry, type its number. To delete an entry, type 'd' followed by that entry-number (eg. d1). To turn on advanced options, type '+'. For help, type 'h'. To save and exit, type 'q'. (to abandon all changes and exit, type 'Q') Option :- n <==添加节点 Adding new node(s) to the cluster: First host-name or IP address :- 192.168.100.6 <==节点ip Number of nodes :- 1 <==节点数 Nodes in your cluster: ====================== 1. 192.168.100.5 2. 192.168.100.6 To add a new set of nodes to your cluster, type 'n'. To modify an entry, type its number. To delete an entry, type 'd' followed by that entry-number (eg. d2). To turn on advanced options, type '+'. For help, type 'h'. To save and exit, type 'q'. (to abandon all changes and exit, type 'Q') Option :- q <==保存退出 Cluster configuration was saved. OK to also update the logical node numbers [Y/n]? y Suggesting to assign '192.168.100.5' as the central queue manager for the cluster (but be cautious if you mix 32-bit and 64-bit nodes in the same cluster) OK to update it now [Y/n]? What would you like to configure next? ====================================== 1. Which nodes are in this cluster 2. Authentication (ESSENTIAL) 3. Logical node numbering 4. Queueing policies 5. Freezing policies 6. Miscellaneous policies 7. Become part of a multi-cluster organizational Grid q. Exit Configure what :- 2 <==设置密码 MOSIX Authentication: ===================== To protect your MOSIX cluster from abuse, preventing unauthorized persons from gaining control over your computers, you need to set up a secret cluster-protection key. This key can include any characters, but must be identical throughout your cluster. Your secret cluster-protection key: xxxx <==输入密码 Your key is 5 characters long. (in the future, please consider a longer one) To allow your users to send batch-jobs to other nodes in the cluster, you must set up a secret batch-client key. This key can include any characters, but must match the 'batch-server' key on the node(s) that can receive batch-jobs from this node. Your secret batch-client key: xxxx <==输入密码 Your key is 5 characters long. (in the future, please consider a longer one) For this node to accept batch jobs, you must set up a secret batch-server key. This key can include any characters, but must match the 'batch-client' key on the sending nodes. To make your batch-server key the same as your batch-client key, type '+'. Your secret batch-server key: xxxx <==输入密码 Your key is 5 characters long. (in the future, please consider a longer one) #保持退出 [root@rhel5 ~]# service mosix restart [root@slave ~]# mosconf .... #操作同rhel5一样 #重启服务 [root@slave ~]# service mosix restart #看看状态吧 [root@slave ~]# service mosix status This MOSIX node is: 192.168.100.6 (no features) Nodes in cluster: ================= 192.168.100.5: proximate 192.168.100.6: proximate Status: Running Normally (32-bits) Load: 0.01 (equivalent to about 0.0066 CPU processes) Speed: 6650 units CPUS: 1 Frozen: 0 Util: 100% Avail: YES Procs: Running 0 MOSIX processes Accept: Yes, will welcome processes from here Memory: Available 461MB/503MB Swap: Available 0.9GB/0.9GB Daemons: Master Daemon: Up MOSIX Daemon : Up Queue Manager: Up Remote Daemon: Up Postal Daemon: Up Guest processes from other clusters in the grid: 0/8 #我比较喜欢看看端口是不是起来了 #TCP/IP ports 249-253 and UDP/IP ports 249-250 must be available for MOSIX [root@slave ~]# netstat -antu | grep -E "24|25" tcp 0 0 0.0.0.0:2401 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:249 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:250 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:251 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:252 0.0.0.0:* LISTEN udp 0 0 0.0.0.0:249 0.0.0.0:* udp 0 0 0.0.0.0:250 0.0.0.0:* #好了,装完了 MOSIX-2.24.2.2/linux-2.6.26集群(三)–应用测试 #先在rehl5和slave上各开启一个终端,运行mon命令,检查 [root@rhel5 ~]# mon #2个节点上应该都是闲置的吧 #为了能出些效果,做点费cpu的脚本,还必须是多线程的, #mosix能够迁移的最小单位是进程,而不是指令或者函数, #所以单进程负载再高也没意义 [root@rhel5 ~]# cat a.sh << EOF awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' & awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' & awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' & awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' & awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' & awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}' & EOF [root@rhel5 ~]# chmod +x a.sh #在rhel5上运行a.sh,也就是产生6个进程了 [root@rhel5 ~]# mosrun -e ./a.sh #开始观察2个节点上的mon画面,刚开始rhel负载很高,然后slave的负载也起来了,能够看到 #能够看到在rhel5上,awk的6个进程还在,但是只有3个在运行,还有3个的状态是T(stop),哈哈,应该是迁移了 [root@rhel5 ~]# ps -aux | grep awk Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7/FAQ root 25648 0.6 0.0 0 0 pts/0 T 16:16 0:00 [awk] root 25650 0.4 0.0 0 0 pts/0 T 16:16 0:00 [awk] root 25652 32.0 0.7 4168 3812 pts/0 R 16:16 0:37 awk BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);} root 25654 32.0 0.7 4168 3816 pts/0 R 16:16 0:37 awk BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);} root 25656 32.0 0.7 4168 3816 pts/0 R 16:16 0:37 awk BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);} root 25658 1.4 0.0 0 0 pts/0 T 16:16 0:01 [awk] root 25665 0.0 0.1 3860 624 pts/0 R+ 16:18 0:00 grep awk #到slave上top看看吧,明显看到有3个叫remoted的进程占用了cpu,这个就是迁移过来的状态吧 top - 16:19:19 up 3:10, 3 users, load average: 2.78, 1.18, 0.44 Tasks: 99 total, 5 running, 94 sleeping, 0 stopped, 0 zombie Cpu(s): 99.3%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si Mem: 515376k total, 423576k used, 91800k free, 107980k buff Swap: 1048568k total, 0k used, 1048568k free, 234028k cach PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16929 root 20 0 4168 3936 0 R 33.2 0.8 0:48.13 remoted 16925 root 20 0 4168 3932 0 R 32.9 0.8 0:50.57 remoted 16927 root 20 0 4168 3932 0 R 32.9 0.8 0:50.13 remoted 1 root 20 0 2036 664 572 S 0.0 0.1 0:01.36 init 2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migratio 4 root 15 -5 0 0 0 S 0.0 0.0 0:02.00 ksoftirq ##############全文测试结束############ |