问题描述:
在服务器上,概率(一个月1次)出现,某个服务停摆了,日志上看,没有任何报错,而且服务进程也在,只是无响应了。
问题分析:
1,用strace命令查看,发现当前进程阻塞在系统API上, futext_wait_queue_me
2, 用gdb命令 attach 当前阻塞进程,bt显示 call trace,发现是syslog系统调用出现了死锁。
3,进一步分析,发现,某个函数在外部流程和信号处理函数(定时任务)中都调到了,而这个函数内部,调用了syslog输出日志。
4,结论:
服务进程和信号处理函数同时调用syslog,可能导致死锁。
问题验证:
-
#include <errno.h>
-
#include <syslog.h>
-
#include <sys/types.h>
-
#include <sys/socket.h>
-
#include <netinet/in.h>
-
#include <sys/select.h>
-
#include <unistd.h>
-
#include <signal.h>
-
#include <sys/types.h>
-
#include <sys/stat.h>
-
#include <fcntl.h>
-
#include <stdio.h>
-
#include <sys/time.h>
-
#include <sys/un.h>
-
-
int idx = 0;
-
static void sigroutine(int signo)
-
{
-
char* logcnt = "this log from sigroutine,************************************************************************************************************************************************************ end";
-
switch (signo)
-
{
-
case SIGALRM:
-
idx = 0;
-
syslog(LOG_DEBUG, "%s\n", logcnt);
-
break;
-
}
-
return;
-
}
-
-
static void srv_heartbeat_start(void)
-
{
-
struct itimerval value, old_value;
-
signal(SIGALRM, sigroutine);
-
-
value.it_value.tv_sec = 0;
-
value.it_value.tv_usec = 50000;
-
value.it_interval.tv_sec = 0;
-
value.it_interval.tv_usec = 50000;
-
setitimer(ITIMER_REAL, &value, &old_value);
-
}
-
-
int main(int argc, char** argv)
-
{
-
char* logcnt = "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++";
-
-
srv_heartbeat_start();
-
while (1)
-
{
-
syslog(LOG_INFO, "--> %d\n", idx++);
-
syslog(LOG_DEBUG, "%s\n", logcnt);
-
}
-
}
如上代码,主服务一直循环输入log,起一个定时器,每50ms 发送一个时钟信号,并且在时钟信号处理函数中打印一条日志。以上程序必须syslog死锁的问题。gdb 打印堆栈如下:
#0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sys
v/linux/x86_64/lowlevellock.S:95
#1 0x00007f73854ec344 in _L_lock_533 () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f73854ebda2 in __GI___vsyslog_chk (pri=, flag=flag@entry=-1, fmt=0x400802 "%s\n", ap=ap@entry=0x7ffcb7e84c38) at ../misc/syslog.c:258
#3 0x00007f73854ec15f in __syslog (pri=, fmt=) at ../misc/syslog.c:117
#4 0x000000000040060f in sigroutine (signo=14) at syslog_lock.c:40
#5
#6 0x00007f73854f30bb in __libc_send (fd=3, buf=0x1a95270, n=33, flags=-2058407747, flags@entry=16384) at ../sysdeps/unix/sysv/linux/x86_64/send.c:27
#7 0x00007f73854ebf31 in __GI___vsyslog_chk (pri=, flag=flag@entry=-1, fmt=0x4008b3 "--> %d\n", ap=ap@entry=0x7ffcb7e853e8) at ../misc/syslog.c:279
#8 0x00007f73854ec15f in __syslog (pri=, fmt=) at ../misc/syslog.c:117
#9 0x00000000004006a2 in main (argc=1, argv=0x7ffcb7e855c8) at syslog_lock.c:65
阅读(8591) | 评论(0) | 转发(0) |