Chinaunix首页 | 论坛 | 博客
  • 博客访问: 3160373
  • 博文数量: 443
  • 博客积分: 11301
  • 博客等级: 上将
  • 技术积分: 5678
  • 用 户 组: 普通用户
  • 注册时间: 2004-10-08 12:30
个人简介

欢迎加入IT云增值在线QQ交流群:342584734

文章分类

全部博文(443)

文章存档

2022年(1)

2021年(1)

2015年(2)

2014年(1)

2013年(1)

2012年(4)

2011年(19)

2010年(32)

2009年(2)

2008年(4)

2007年(31)

2006年(301)

2005年(42)

2004年(2)

分类:

2006-05-02 20:57:17

Sun Cluster 2.x and 3.x: Explanation of the Process Monitor Facility (PMF)
 
This document attempts to explain, in standard English, how the Process Monitor Facility, known as pmf, pmfd or rpc.pmfd, excercises control over other system processes.
 
PMF is a cluster (2.x and 3.x) facility to monitor processes within the cluster framework and provide for restarts of the processes where possible.
 
PMF is also tied to the failfast driver to alert the failfast driver when a critical cluster process has died.
 
This functionality has been expanded in Cluster 3.x to provide for monitoring of dataservices (resources), notably the generic data ervice.
This has led to an increase in PMF's visibility and an increase in the need for information on its operation.
 
BASICS
 
PMF is a RPC (Remote Procedure Call) service called rpc.pmfd and is located in /opt/SUNWcluster/bin (2.2) or /usr/cluster/lib/sc (3.x, 32 bit) or /usr/cluster/lib/sc/sparcv9 (3.x, 64 bit).
 
In Sun Cluster 2.2, it is started by a rc (run control) script in /etc/rc3.d called S23initpmf.
In Sun Cluster 3.0, this script is called S17initpmf.
 
During startup, the process (rpc.pmfd) sets aside some memory for itself,
puts itself into real-time mode and waits for processes to be registered through pmfadm (the administrative command).
 
Cluster processes that need pmf monitoring get started with pmfadm to place them in pmfd's monitoring list.
 
MONITORING
 
pmf monitors processes through the use of tags handed to it by pmfadm and by attaching itself to pids (process identifiers) in the /proc/filesystem.
 
Because only one monitoring process can be attached to a given pid in he /proc filesystem at a time, this prevents truss from being used on a process that is started up under pmf control.
 
pmf uses the /proc filesystem monitoring because one of the options that is available when starting a process under pmf control is to monitor that process' children, or sub-processes. In order to do this, pmf listens to each process (using a method similar to truss) to detect fork() system calls, indicating that a child process is being created. Once a child has been detected, pmf then gathers information about that process and keeps track of it as well.
 
Different levels of child monitoring can be specified for pmf's behavior.
The default is for pmf to monitor a process and all children. In that event, the original process is not restarted until it and all its children have died.
 
RESTARTING
 
The truly interesting behavior of pmf is centered around what to do when a particular process that is being monitored by pmf dies. This action is determined by:
1. Are children being monitored?
2. What action is specified for this process when it dies?
3. How much time has elapsed since it died last?
4. How many times have we tried to restart it?
These are all configurable using pmfadm. The general logic is as follows:
- If a process dies, check to see if we are still monitoring it.
- If we are supposed to stop monitoring it, quit.
- If we are still monitoring the process that died, check to see if we are monitoring its parent or children.
- If we are monitoring the parent or children, check to see if they are running.
- If the parent or children are running, quit.
- If this is the last of the parent/child processes that have died, check to see if there is an action to be performed (specified by the -a option to pmfadm).
- If there is no action, quit.
- If there is an action, execute it.
- If the action is successful, check to see if we have restarted this process too many times (configurable with -n in pmfadm ) in the pecified time interval (-t option to pmfadm).
- If we have exceeded our restart limits, quit.
- If we have not exceeded our restart limits, restart the process with the original arguments, and count another failure in the timeout period.
- If the action is not successful, remove the process from monitoring.
 
pmfadm -- the administrative interface
 
Once rpc.pmfd is started, everything else occurs via pmfadm commands.
A simplified arguments list follows:
 -a : The name of the action script to execute as part of the restart logic
 -c nametag : Start a process and use as its identifier.
 -C level : Keep track of this level of children. Default is all;i.e., children, children's children, and so on.
 -e ENV_VAR=env.value : An environment variable in the form ENV_VAR=env.value which is passed to the new process. This option can be repeated.
 -E :  Pass the whole pmfadm environment to the new  process.
NOTE: The -e and -E options are mutually exclusive.
 -h host : The name of the host to contact. Default is localhost.
 -k nametag signal : Send the specified signal to the processes ssociated with nametag, including any processes associated with the action program if it is currently running. The default signal, SIGKILL, is sent if none is specified.If the process and its descendants exit, and there are remaining retries available, the process monitor res-tarts the process. The signal specified is the same set of names recognized by the kill(1) command.
 -l nametag: Print out status information about nametag.
 -L : Return a list of all tags running that belong to the user that issued the command, or if the user is root,all tags running on the server are shown.
 -m nametag : Modify the number of retries, or time period over which to observe retries, for nametag.
 -n retries : Number of retries allowed within the specified time period. The default value for this field is 0, which means that the process is not restarted once it exits.A value of -1 indicates that the number of retries is infinite.
 -q nametag : Indicate whether nametag is registered and running under the process monitor. Returns 0 if it is, 1 if it is not.
 -s nametag : Stop restarting the command associated with nametag.
 -t period :  Minutes over which to count failures. The default value is -1, which equates to infinity.
 -w timeout : When used in conjunction with the -s nametag or -k nametag flags, wait up to the specified number of seconds for the processes associated with nametag to exit. 
阅读(2641) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~