Chinaunix首页 | 论坛 | 博客
  • 博客访问: 284307
  • 博文数量: 67
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 620
  • 用 户 组: 普通用户
  • 注册时间: 2015-07-12 19:56
文章分类

全部博文(67)

文章存档

2019年(1)

2018年(1)

2017年(4)

2016年(34)

2015年(27)

我的朋友

分类: LINUX

2015-08-21 14:25:32



For the BUG86475,  IMO, summarized as follows after making some investigations.

 

1. In the signal processing function, the non-reentrant function can’t be called directly or indirectly.  For example, calloc and malloc.  Because they operate the global memory allocation table.

 

2. But in the function "certd_sig_handler"( certd.c: 7617)  which is used to handle SIGCHLD,  the function "create_manifest"(certd-cluster-funcs.c: 339)  invokes the function "calloc".  As a result, it was not expected  when the signal SIGCHLD is triggered.

 

3.  Now, we analyse the stack that was follow.
 

          The daemon "certd" called the function "malloc" that was indicated by the tag #9.  At the same time, the signal handler "certd_sig_handler" was called before the function malloc  returned.  It was indicated by the tag #2.

          I dare to guess that the #9 invoke the mutex that protects the global memory allocation table and the #2 also invokes  the same mutex before the #9 returns. So it leads deadlock which was indicated by the blue font.

          Although the probability of error is very small in the transient signal processing functions, but reproduce it easily in the SMP device and high workload.

 

# gdb -p 3698

(gdb) bt

#0  0x00002aaab0a8fd4e in ?? () from /lib64/libc.so.6

#1  0x00002aaab0a1ba51 in ?? () from /lib64/libc.so.6

#2  0x00002aaab0a19dc1 in calloc () from /lib64/libc.so.6

#3  0x00002aaaad9cb9b6 in create_manifest () from /lib64/libhashfiles.so

#4  0x0000000000406db3 in ?? ()

#5  0x00002aaaad7c4a08 in signal_sigaction () from /lib64/libsignal.so

#6 

#7  0x00002aaab0a15eb5 in ?? () from /lib64/libc.so.6

#8  0x00002aaab0a172b8 in ?? () from /lib64/libc.so.6

#9  0x00002aaab0a19440 in malloc () from /lib64/libc.so.6

#10 0x00002aaaad01d663 in CRYPTO_malloc () from /lib64/libcrypto.so.1.0.0

#11 0x00002aaaad092ad4 in BUF_MEM_grow () from /lib64/libcrypto.so.1.0.0

#12 0x00002aaaad0d4920 in PEM_read_bio () from /lib64/libcrypto.so.1.0.0

#13 0x00002aaaad0d4ec6 in PEM_bytes_read_bio () from /lib64/libcrypto.so.1.0.0

#14 0x00002aaaad0d682f in PEM_ASN1_read_bio () from /lib64/libcrypto.so.1.0.0

#15 0x00002aaaac47c8fb in load_cert () from /lib64/libpkicli.so

#16 0x00002aaaac4897c9 in wg_get_cert_info_by_purpose ()

   from /lib64/libpkicli.so

#17 0x000000000040e33a in ?? ()

#18 0x000000000040ea0f in ?? ()

#19 0x0000000000408b3c in ?? ()

#20 0x0000000000408b7e in ?? ()

#21 0x0000000000408b3c in ?? ()

#22 0x000000000040f203 in ?? ()

---Type to continue, or q to quit---

#23 0x00002aaaabc3108b in ?? () from /lib64/liblistener.so

#24 0x00002aaaabc3003c in ListenLoop () from /lib64/liblistener.so

#25 0x0000000000404fec in ?? ()

#26 0x00002aaab09bebb5 in __libc_start_main () from /lib64/libc.so.6

#27 0x0000000000405415 in ?? ()

 

[Certd hang]

#strace -p 3698

Process 3698 attached - interrupt to quit

futex(0x2aaab0d41600, FUTEX_WAIT_PRIVATE, 2, NULL

Process 3698 detached

4.
Teamtrack ID (Bug/RFE/Task):

           BUG86475: certd sometimes get stuck because of writing non-async-signal-safe signal handler

 

        Root Cause (Bug) or Purpose (RFE/Task):

           (1). In the signal processing function, it calls the non-reentrant function. For example, "calloc" and "free" etc. It is likely to lead to a deadlock.

           (2). Currently, the main thread generates the zombie process likely. Since its signals "SIGCHLD" and the signal "SIGCHLD" in the function "wgut_system" are unexpected.

 

        Solution:

           (1). Create pipe. When the signal was triggered, the signal processing function only uses the reentrant function to send the corresponding signal value to the pipe. Be similar to the upper part of interrupt handling.

                The main thread always listens the read event of the pipe. When there is a read event, it's time to finish all the rest of the work. Thus, it completes the most of the signal processing work that includes the non-reentrant function. Be similar to the lower part of interrupt handling.

           (2). To avoid the zombie process, the main thread creates child process after the operation of the signal "SIGCHLD".

 


 

阅读(1342) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~