文章内容摘自:
http://blog.csdn.net/rstevens/article/details/1853779
http://blog.csdn.net/liigo/article/details/9227205
Linux 自带了一个 watchdog 的实现,用于监视系统的运行,包括一个内核 watchdog module 和一个用户空间的 watchdog 程序。内核 watchdog 模块通过 /dev/watchdog 这个字符设备与用户空间通信。用户空间程序一旦打开 /dev/watchdog 设备(俗称“开门放狗”),就会导致在内核中启动一个1分钟的定时器(系统默认时间),此后,用户空间程序需要保证在1分钟之内向这个设备写入数据(俗称“定期喂狗”),每次写操作会导致重新设定定时器。如果用户空间程序在1分钟之内没有写操作,定时器到期会导致一次系统 reboot 操作(“狗咬人了”呵呵)。通过这种机制,我们可以保证系统核心进程大部分时间都处于运行状态,即使特定情形下进程崩溃,因无法正常定时“喂狗”,Linux系统在看门狗作用下重新启动(reboot),核心进程又运行起来了。多用于嵌入式系统。
用户空间程序可通过关闭 /dev/watchdog 来停止内核中的定时器。
用户空间的 watchdog 守护进程:
在用户空间,还有一个叫做 watchdog 的守护进程,它可以定期对系统进行检测,包括:
-
Is the process table full?
-
Is there enough free memory?
-
Are some files accessible?
-
Have some files changed within a given interval?
-
Is the average work load too high?
-
Has a file table overflow occurred?
-
Is a process still running? The process is specified by a pid file.
-
Do some IP addresses answer to ping?
-
Do network interfaces receive traffic?
-
Is the temperature too high? (Temperature data not always available.)
-
Execute a user defined command to do arbitrary tests.
如果某项检测失败,则可能导致一次 soft reboot (模拟一次 shutdown 命令的执行)
它还可以通过 /dev/watchdog 来触发内核 watchdog 的运行。
附一份watchdog控制代码:
-
#include <errno.h>
-
#include <fcntl.h>
-
#include <stdlib.h>
-
#include <string.h>
-
-
#include <linux/watchdog.h>
-
-
#include "log.h"
-
#include "util.h"
-
-
#define DEV_NAME "/dev/watchdog"
-
-
int watchdogd_main(int argc, char **argv)
-
{
-
int fd;
-
int ret;
-
int interval = 10;
-
int margin = 10;
-
int timeout;
-
-
open_devnull_stdio();
-
klog_init();
-
-
INFO("Starting watchdogd\n");
-
-
if (argc >= 2)
-
interval = atoi(argv[1]);
-
-
if (argc >= 3)
-
margin = atoi(argv[2]);
-
-
timeout = interval + margin;
-
-
fd = open(DEV_NAME, O_RDWR);
-
if (fd < 0) {
-
ERROR("watchdogd: Failed to open %s: %s\n", DEV_NAME, strerror(errno));
-
return 1;
-
}
-
-
ret = ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
-
if (ret) {
-
ERROR("watchdogd: Failed to set timeout to %d: %s\n", timeout, strerror(errno));
-
ret = ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
-
if (ret) {
-
ERROR("watchdogd: Failed to set timeout to %d: %s\n", timeout, strerror(errno));
-
ret = ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
-
if (ret) {
-
ERROR("watchdogd: Failed to get timeout: %s\n", strerror(errno));
-
} else {
-
if (timeout > margin)
-
interval = timeout - margin;
-
else
-
interval = 1;
-
ERROR("watchdogd: Adjusted interval to timeout returned by driver: timeout %d, interval %d, margin %d\n",
-
timeout, interval, margin);
-
}
-
}
-
-
while(1) {
-
write(fd, "", 1);
-
sleep(interval);
-
}
-
}
阅读(2897) | 评论(0) | 转发(0) |