Below are descriptions of the states that a Cluster Server node could end up in after a reboot as seen from the following command:
# hastatus
attempting to connect....connected
group resource system message
--------------- -------------------- --------------- --------------------
sptsunvcs3 STALE ADMIN WAIT: all system stale
sptsunvcs4 STALE ADMIN WAIT: all system stale
ADMIN_WAIT state:
If VCS is started on a system with a valid configuration file, and if other systems are in the ADMIN_WAIT state, the new system transitions to the ADMIN_WAIT state.
INITING===>CURRENT_DISCOVER_WAIT===>ADMIN_WAIT
If VCS is started on a system with a stale configuration file, and if other systems are in the ADMIN_WAIT state, the new system transitions to the ADMIN_WAIT state.
INITING===>STALE_DISCOVER_WAIT===>ADMIN_WAIT
STALE_ADMIN_WAIT state:
If VERITAS Cluster Server is started on a system with a stale configuration file, and if all other systems are in STALE_ADMIN_WAIT state, the system transitions to the STALE_ADMIN_WAIT state as shown below. A system stays in this state until another system with a valid configuration file is started, or when the command hasys -force is issued.
INITING===>STALE_DISCOVER_WAIT===>STALE_ADMIN_WAIT
Resolution:
If all systems are in STALE_ADMIN_WAIT or ADMIN_WAIT, first validate the configuration file (/etc/VRTSvcs/conf/config/main.cf) on all systems in the cluster by running the '
hacf -verify .' command for syntax error check (ensure that this command is run in the directory containing the main.cf file), and reviewing its contents for proper resource and service group definitions.
Then enter the following command on the system with the correct configuration file to force start VCS.
# hasys -force system_name
This will have the effect of starting Cluster Server on that node and starting Cluster Server running on all other nodes in the ADMIN_WAIT or STALE_ADMIN_WAIT state.
One of the most common causes of a node being in one of these states is the existence of /etc/VRTSvcs/conf/config/.stale. This file is typically left behind if Cluster Server is stopped while the configuration is still open, i.e. someone has forgotten to save changes made to a running main.cf configuration. The .stale file is deleted automatically if changes are correctly saved and will therefore not force the relevant node into an ADMIN state when it next has to restart Cluster Server. As indicated earlier, the file can be safely removed if the main.cf file is known to be ok.