Chinaunix首页 | 论坛 | 博客
  • 博客访问: 563265
  • 博文数量: 199
  • 博客积分: 5087
  • 博客等级: 大校
  • 技术积分: 2165
  • 用 户 组: 普通用户
  • 注册时间: 2010-01-26 21:53
文章存档

2010年(199)

我的朋友

分类:

2010-07-05 18:37:17

WATCHDOG(8)                                                        WATCHDOG(8)


NAME
       (,) - a software (,)

SYNOPSIS
       (,) [ -f | --force ] [ -c | --config-file ] [ -v
       | --verbose ] [ -s | --sync ] [ -b | --softboot ] [ -q | --no-action ]

DESCRIPTION
       Watchdog is a that checks (,n) your system is  still  working.  If
       programs  (,)  user space are not longer executed it will hard (,,) the
       system.

       The kernel provides /dev/(,), when (2,,) must be  written  to
       within  a  minute  or  the  machine  will reboot. Each (,) delays the
       (,,n) another minute. After a minute the (,)  hardware  will
       cause  the  reset.  In the of the software (,) the ability to
       will depend on the of the machines and interrupts.

       Watchdog can  be  stopped  without  causing  a    (,n)  the  device
       /dev/(,) is closed correctly, unless of course your kernel is com-
       piled with the CONFIG_WATCHDOG_NOWAYOUT enabled.


TESTS
       Watchdog itself does several additional tests to check the system  sta-
       tus:

       Check whether the process table is full.

       Check whether there is enough available.

       Check whether some given files are accessible.

       Check whether some given files change (,) a given interval.

       Check whether the average work (,) exceeds a predefined maximal value.

       Check whether the a (,) table overflow occurred.

       Check  whether  a given process (specified by a (,)) is still run-
       ning.

       Check whether some given IP addresses answer to a message.

       Check whether some given network interfaces received some traffic.

       Check the temperature ((,n) available)

       Execute a user defined to do arbitrary tests.

       If  any of these checks fail (,) will cause a shutdown. Should any
       of these tests except the user defined      longer  than  one
       minute the machine will be rebooted, too.



       Available are the following:

       -v | --verbose
              Set  verbose mode. Only implemented (,n) compiled with SYSLOG fea-
              ture. This mode will each several infos (,)  LOG_DAEMON  with
              priority  LOG_INFO.   This  is useful (,n) you want to see exactly
              what happened until (,) rebooted the system.  Currently  it
              logs  the  temperature  ((,n)  available),  the  (,) average, the
              change of the files it checks and  how  often  it  went  to
              sleep.

       -s | --sync
              Try  to  (,,) the filesystem every (,,n) the process is awake. Be
              aware that the system is rebooted  (,n)  any reason syncing lasts longer than a minute. -b | --softboot Soft-boot the system  (,n) an (,) occurs during the main loop,
              e.g. (,n) the (,) given with -n is not accessible via  the
              (,)  call.  Note  that this does not apply to the (2,,) calls to
              /dev/(,) and /(,)/loadavg are opened before the main
              loop starts.

       -f | --force
              Force  the usage of the interval given or the maximal (,) aver-
              age given (,) the (,) file.

       -c <(,) (,)> | --config-file <(,) (,)>
              Use  <(,)  (,)>    (,)  (,)  instead  of  the  default
              /etc/watchdog.conf.

       -q | --no-action
              Do not or halt the machine. This is testing purposes.
              All checks are executed and the results are logged usual, but
              no  action  is  taken.  Also your hardware resp. the kernel
              software (,) driver is not enabled. Note  that  temperature
              checking  is  also  disabled  since  this  triggers the hardware
              (,) on some cards.


FUNCTION
       Watchdog starts, put itself into the background and then try all checks
       specified  (,)  its  (,) (,) (,) turn. Between each two tests it will
       trigger the kernel device. After finishing all tests (,)  goes  to
       (,) some time. The kernel drivers expects a (,) to the (,)
       device every minute.  Otherwise the  system  will  be  rebooted.  As  a
       default  (,)  will  (,)    only 10 seconds so it triggers the
       device early enough.

       Under high system (,) (,) might be swapped out of and  may
       fail  to    it back (,) (,) time. Under these circumstances the Linux
       kernel will hard (,,) the machine. To sure you won't get unnecas-
       sary  reboots  sure you have the 'realtime' (,,) to (,)
       the (,) (,) watchdog.conf. It adds real (,,n) support  to  watchdog.
       Thus  it  will  itself into memeory and there should be no problem
       even under the highest of loads.

       Also you can specify a maximal allowed (,)  average.  Once  this  (,)
       average is reached the system is rebooted. You may specify maximal (,)
       averages 1 minute, 5 minutes or 15 minutes. The default  values  is
       to  this test. Be careful not to (,,) this parameter too low. To
       (,,) a value (,) then the predefined minimal value of 2,  you  have  to
       use the -f option.

       You  can  also  specify  a minimal amount of (,) you want to
       have available free. As soon (,) is  used  action
       is taken by watchdog. Note, however, that (,) does not distinguish
       between different types of usage. It just checks     -
       (,)
memory.

       If you have a (,) with temperature sensor you can specify the
       maximal allowed temperature. Once this temperature is reached the  sys-
       tem  is  halted.  Default value is 120. There is no unit conversion. So
       sure you use the same unit your hardware. Watchdog  will 
         once  the tempearture increases 90%, 95% and 98% of this tem-
       perature.

       When using (,) mode (,) will try (,)  the  given  files.  Errors
       returned  by  (,)  will not cause a reboot. For a the (,)
       has to least one minute.   This  may  happen  (,n)  the  (,)  is
       located  on  an NFS mounted filesystem. If your system relies on an NFS
       mounted filesystem you might try this option.  However, (,) such a 
       the (,,) may not work (,n) the NFS server is not answering.

       If  you give (,) a pidfile it will (,,) the from this (,) and
       (,,)(,0) to see whether the process still exists. If not action
       is  taken  by watchdog. So you can instance restart the server from
       your repair-binary.

       Watchdog will try periodically  to    itself  to  see  whether  the
       process  table  is full. This process will leave a zombie process until
       (,) wakes up again and cathes it.

       In   mode  (,)  tries  to    the  given  addresses.  These
       addresses do not have to be a single machine. It is possible to to
       a broadcast address instead to see (,n) least one machine (,) a  subnet
       is still living.

       Do not use this broadcast unless your MIS person a) knows about it
       and b) has given you explicit permission to use it!

       Watchdog will (,n) out three packages and   up  to 
       seconds    the reply with being the (,,n) it goes to (,)
       between two triggering the (,) device.  Thus  a  unreachable
       network will not cause a hard (,,) but a soft reboot.

       You can also passively an unreavhable network by just monitor-
       ing a given interface traffic. If no traffic arrives the network is
       considered  unreachable  causing  a  soft  resp. action from the
       binary.

       With using an external check    (,)  can  run  user  defined
       tests.  This may longer than the (,,n) slice defined the kernel
       device without a problem. However, note that (,) this    (,)  mes-
       sages are generated into the (,,,) facility. If you have enabled soft-
       on (,) the machine will be rebooted (,n) the   doesn't  (,,)
       (,)  half the (,,n) (,) sleeps between two tries triggering the ker-
       nel device.

       If you specify a it will be started instead  of  shutting
       down the system. If this is not able to fix the problem (,)
       will still cause a afterwards.

       If eventually the machine is halted an email is (,n) to a  human
       that the machine is going down. Starting with (,,) 4.4 (,) will
       also the human (,) charge (,n) the machine is rebooted.


SOFT REBOOT
       A soft (i.e. controlled (,) and )  is  initiated 
       every  (,)  that  is  found.  Since  there might be no processes
       available, (,) does it all by himself. That means:

       1) Kill all processes with SIGTERM.

       2) After a short (,,) all remaining processes with SIGKILL.

       3) Record a (,) (,) wtmp.

       4) Save the (,4,) seed from /dev/urandom. If the device  is  non-exis-
       tant or
              the to save to is empty this step is skipped.

       5) Turn off accounting.

       6) Turn off (,) and swapp.

       7) Unmount all partitions except the partition.

       8) Remount the partition read-only.

       9) Shut down all network interfaces.

       10) Finally reboot.


CHECK BINARY
       If the code of the check is not (,) will assume
       an  (,)  and the system. Be careful with this (,n) you are using
       the real-time properties of (,) since (,) will     the
         of  this    before  proceeding.  An positive (,,) code is
       interpreted an system (,) code (see errno.h details) Negative
       values are special to watchdog:

       -1    the system. This is not exactly an (,) but a -
       to
              watchdog.  If the code is -1 (,) will not try to run
              a (,) instead.

       -2 (,,) the system. This is not exactly an (,) but a
       to
              watchdog. If the code is -2 (,) will  simply  refuse
              to (,) the kernel device again.

       -3 max (,) average exceeded.

       -4 the temperature inside is too high.

       -5 /(,)/loadavg contains no (or not enough) data.

       -6 Given (,) was not changed (,) the given interval.

       -7 /(,)/meminfo contains invalid data.

       -8 personal use



       REPAIR BINARY
              The  is started with one parameter: the (,) -
              that caused (,) (,) initiate  the    process.  After
              trying to the system the should (,,) with 0 (,n) the
              system was successfully repaired and thus there is  no    to
                anymore.  A    value  not  equal 0 tells (,) to
              reboot. The code of the should be the (,)
                of the (,) causing (,) to reboot. Be careful with
              this (,n) you are using the real-time properties of (,) since
              (,) will the of this before proceed-
              ing.

BUGS
       None known so far.


AUTHORS
       The   original   code   is   an   example   written   by    Alan    Cox
       ,  the author of the kernel driver. All addi-
       tions were written by Michael Meskes Johnie Ingram
         had  the idea of testing the (,) average. He also
       took over the Debian specific work. Dave Cinege
       brought up some hardware (,) issues and helped testing this stuff.


FILES
       /dev/(,)  The (,) device
       /var/run/watchdog.pid The PID of the running (,)

SEE ALSO
       (5)



4th Berkeley Distribution        February 1996                     WATCHDOG(8
 
转自:
阅读(980) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~