概要记录
发生时间: 公元2010年08月28日
问题现象: java程序莫名挂起,追查挂起的地方,竟然是一个很不起眼的System.out.println("XXXX");
环境描述: 用 daemontools 构建的"进程不会消失的"启动方式.
daemontools维护的是一个 java 服务程序,采用脚本启动,并将控制台的日志通过multilog
转存到日志文件中.
调查过程:
经过调查是在处理log/main 日志的权限是存在不足:
制作新的服务启动脚本时, 因为拷贝 (cp) 回导致权限改变(root),
启动时没有影响,正常运行时log/main/currenct 可以正常写入,
当程序运行一段时间后currenct日志写满了,不能新建日志(目录权限不为logadmin),
此时就很严重了,会造成程序挂起.
一般的程序中总存在多少一点控制台输出,multilog在处理这个信息的时候会阻塞住程序的输出,
直到自己的日志转存工作结束才会释放,但如果转存中出现权限问题,那么悲剧就来了.
日志转存记录脚本:
vi log/run
#!/bin/sh
exec setuidgid logadmin multilog t s1000000 n100 ./main
总结教训:
1,日志的属组千万别改.
2,程序里面少弄点这样的日志.
3,其实最好的就是去掉这个控制台信息转存,如果要了没啥大用的话.
java 堆栈信息和测试程序代码:
jstack -l 2630
2010-09-08 01:33:55
Full thread dump Java HotSpot(TM)
Client VM (14.0-b16 mixed mode, sharing):
"Attach Listener" daemon
prio=10 tid=0xb5014c00 nid=0x16c6 waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable
synchronizers:
- None
"Low Memory Detector" daemon prio=10
tid=0x08c24400 nid=0xa59 runnable [0x00000000]
java.lang.Thread.State:
RUNNABLE
Locked ownable synchronizers:
-
None
"CompilerThread0" daemon prio=10 tid=0x08c22800 nid=0xa58 waiting on
condition [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked
ownable synchronizers:
- None
"Signal Dispatcher" daemon
prio=10 tid=0x08c20c00 nid=0xa57 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable
synchronizers:
- None
"Finalizer" daemon prio=10
tid=0x08c10c00 nid=0xa56 in Object.wait() [0xb5426000]
java.lang.Thread.State: WAITING (on object monitor)
at
java.lang.Object.wait(Native Method)
- waiting on <0x8be43e70>
(a java.lang.ref.ReferenceQueue$Lock)
at
java.lang.ref.ReferenceQueue.remove(Unknown Source)
- locked
<0x8be43e70> (a java.lang.ref.ReferenceQueue$Lock)
at
java.lang.ref.ReferenceQueue.remove(Unknown Source)
at
java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
Locked
ownable synchronizers:
- None
"Reference Handler" daemon
prio=10 tid=0x08c0f400 nid=0xa55 in Object.wait() [0xb5477000]
java.lang.Thread.State: WAITING (on object monitor)
at
java.lang.Object.wait(Native Method)
- waiting on <0x8be43ef8>
(a java.lang.ref.Reference$Lock)
at
java.lang.Object.wait(Object.java:485)
at
java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
- locked
<0x8be43ef8> (a java.lang.ref.Reference$Lock)
Locked ownable
synchronizers:
- None
"main" prio=10 tid=0x08be9800 nid=0xa53
runnable [0xb76af000]
java.lang.Thread.State: RUNNABLE
at
java.io.FileOutputStream.writeBytes(Native Method)
at
java.io.FileOutputStream.write(Unknown Source)
at
java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at
java.io.BufferedOutputStream.flush(Unknown Source)
- locked
<0x8be46818> (a java.io.BufferedOutputStream)
at
java.io.PrintStream.write(Unknown Source)
- locked <0x8be43f00>
(a java.io.PrintStream)
at
sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
at
sun.nio.cs.StreamEncoder.implFlushBuffer(Unknown Source)
at
sun.nio.cs.StreamEncoder.flushBuffer(Unknown Source)
- locked
<0x8be46858> (a java.io.OutputStreamWriter)
at
java.io.OutputStreamWriter.flushBuffer(Unknown Source)
at
java.io.PrintStream.write(Unknown Source)
- locked <0x8be43f00>
(a java.io.PrintStream)
at java.io.PrintStream.print(Unknown
Source)
at java.io.PrintStream.println(Unknown Source)
-
locked <0x8be43f00> (a java.io.PrintStream)
at
jp.emcom.cfd.ratechecker.Test.main(Test.java:12)
Locked ownable
synchronizers:
- None
"VM Thread" prio=10 tid=0x08c0d800
nid=0xa54 runnable
"VM Periodic Task Thread" prio=10 tid=0x08c2e400
nid=0xa5a waiting on condition
JNI global references: 741
Test.java
附录:
daemontools + 这个 java 应用的搭建过程:
一,下载安装
wget
wget
wget
二, 制作 sshd 在 daemontools下启动
ls -l /home/src
-rw-r--r-- 1 root root 289280 Jul 13 2001 daemontools-0.76.tar
-rw-r--r-- 1 root root 252264 Jan 27 2004 netqmail-1.05.tar.gz
-rw-r--r-- 1 root root 53019 Mar 18 2000 ucspi-tcp-0.88.tar.gz
安装daemontools
[root@monitor ~]# cd /home/src/
cd /home/src/admin/daemontools-0.76
package/install
daemontools 程序安装完毕
cd /service/
mkdir sshd
chmod +t sshd
mkdir sshd/log
mkdir sshd/log/main
touch sshd/log/status
useradd logadmin
chown logadmin:daemontools sshd/log/main
chown logadmin:daemontools sshd/log/status
[root@monitor ~]#vi /service/sshd/run
#!/bin/sh
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
export PATH
sleep 5
tcprules allow.cdb allow.tmp < allow
exec env - PATH=$PATH \
tcpserver -vHRDl0 -x allow.cdb -u 0 0 ssh sshd -i -e 2>&1
[root@monitor ~]# chmod 755 /service/sshd/run
vi /service/sshd/log/run
#!/bin/sh
exec setuidgid logadmin multilog t s1000000 n100 ./main
[root@monitor ~]# chmod 755 /service/sshd/log/run
[root@monitor ~]# find /command -type l |xargs ls -l
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/envdir -> /package/admin/daemontools/command/envdir
lrwxrwxrwx 1 root root 44 Apr 10 22:11 /command/envuidgid -> /package/admin/daemontools/command/envuidgid
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/fghack -> /package/admin/daemontools/command/fghack
lrwxrwxrwx 1 root root 43 Apr 10 22:11 /command/multilog -> /package/admin/daemontools/command/multilog
lrwxrwxrwx 1 root root 43 Apr 10 22:11 /command/pgrphack -> /package/admin/daemontools/command/pgrphack
lrwxrwxrwx 1 root root 48 Apr 10 22:11 /command/readproctitle -> /package/admin/daemontools/command/readproctitle
lrwxrwxrwx 1 root root 42 Apr 10 22:11 /command/setlock -> /package/admin/daemontools/command/setlock
lrwxrwxrwx 1 root root 44 Apr 10 22:11 /command/setuidgid -> /package/admin/daemontools/command/setuidgid
lrwxrwxrwx 1 root root 44 Apr 10 22:11 /command/softlimit -> /package/admin/daemontools/command/softlimit
lrwxrwxrwx 1 root root 44 Apr 10 22:11 /command/supervise -> /package/admin/daemontools/command/supervise
lrwxrwxrwx 1 root root 38 Apr 10 22:11 /command/svc -> /package/admin/daemontools/command/svc
lrwxrwxrwx 1 root root 39 Apr 10 22:11 /command/svok -> /package/admin/daemontools/command/svok
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/svscan -> /package/admin/daemontools/command/svscan
lrwxrwxrwx 1 root root 45 Apr 10 22:11 /command/svscanboot -> /package/admin/daemontools/command/svscanboot
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/svstat -> /package/admin/daemontools/command/svstat
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/tai64n -> /package/admin/daemontools/command/tai64n
lrwxrwxrwx 1 root root 46 Apr 10 22:11 /command/tai64nlocal -> /package/admin/daemontools/command/tai64nlocal
[root@monitor ~]# find /usr/local/bin -type l -exec ls -l {} \;
lrwxrwxrwx 1 root root 18 Apr 10 22:11 /usr/local/bin/envuidgid -> /command/envuidgid
lrwxrwxrwx 1 root root 17 Apr 10 22:11 /usr/local/bin/pgrphack -> /command/pgrphack
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/envdir -> /command/envdir
lrwxrwxrwx 1 root root 18 Apr 10 22:11 /usr/local/bin/supervise -> /command/supervise
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/svstat -> /command/svstat
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/svscan -> /command/svscan
lrwxrwxrwx 1 root root 16 Apr 10 22:11 /usr/local/bin/setlock -> /command/setlock
lrwxrwxrwx 1 root root 17 Apr 10 22:11 /usr/local/bin/multilog -> /command/multilog
lrwxrwxrwx 1 root root 18 Apr 10 22:11 /usr/local/bin/setuidgid -> /command/setuidgid
lrwxrwxrwx 1 root root 20 Apr 10 22:11 /usr/local/bin/tai64nlocal -> /command/tai64nlocal
lrwxrwxrwx 1 root root 18 Apr 10 22:11 /usr/local/bin/softlimit -> /command/softlimit
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/tai64n -> /command/tai64n
lrwxrwxrwx 1 root root 13 Apr 10 22:11 /usr/local/bin/svok -> /command/svok
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/fghack -> /command/fghack
lrwxrwxrwx 1 root root 19 Apr 10 22:11 /usr/local/bin/svscanboot -> /command/svscanboot
lrwxrwxrwx 1 root root 12 Apr 10 22:11 /usr/local/bin/svc -> /command/svc
lrwxrwxrwx 1 root root 22 Apr 10 22:11 /usr/local/bin/readproctitle -> /command/readproctitle
[root@monitor ~]# find /command -type l |xargs rm
find /usr/local/bin -type l -exec rm {} \;
cp -a /package/admin/daemontools-0.76/command/* /usr/local/bin/
vi /service/sshd/allow
10.4.:allow
127.0.0.1:allow
:deny
[root@monitor ~]# vi /etc/rc.d/init.d/svscan
#!/bin/sh
#
# svscan Start svscan.
#
# chkconfig: 345 99 50
# description: svscan starts and monitors a collection of services.
# see
# processname: svscan
# pidfile: /var/run/svscan.pid
# service directory: /service
# Source function library.
. /etc/rc.d/init.d/functions
export PATH="/usr/local/bin:$PATH"
SVDIR=/service
case "$1" in
start)
echo -n "Starting svscan: "
exec env - PATH="/usr/local/bin:/usr/bin:/bin" SVDIR="$SVDIR" \
csh -cf 'svscan $SVDIR &; echo $! > /var/run/svscan.pid'
RETVAL=$? # Sorry, I don't catch exit code of svscan.
echo
[ $RETVAL -eq 0 ] && touch /var/lock/subsys/svscan
;;
stop)
echo -n "Stopping svscan: "
killproc svscan
RETVAL=$?
svc -dx $SVDIR/* $SVDIR/*/log
echo
[ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/svscan
;;
status)
status svscan
RETVAL=$?
;;
restart|reload)
$0 stop
$0 start
RETVAL=$?
;;
*)
echo "Usage: $0 {start|stop|status|restart|reload}"
exit 1
esac
exit $RETVAL
[root@monitor ~]# chmod 755 /etc/rc.d/init.d/svscan
[root@monitor ~]# chkconfig --add svscan
[root@monitor ~]# /etc/rc.d/init.d/svscan start
[root@monitor ~]# vi /etc/inittab
#SV:123456:respawn:/command/svscanboot
[root@monitor ~]# /etc/init.d/sshd stop (关闭系统的sshd,停止前请后请不要退出终端)
svc -u /service/sshd/ 启动daemontools 下的 sshd
chkconfig sshd off
chkconfig --list |grep sshd
sshd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@monitor ~]# ps -ef|grep sshd
root 3206 1 0 18:25 ? 00:00:03 sshd: root@pts/0
root 6010 6009 0 19:23 pts/0 00:00:00 supervise sshd
root 6011 6010 0 19:23 pts/0 00:00:00 tcpserver -vHRDl0 -x allow.cdb -u 0 0 ssh sshd -i -e
root 6016 6011 0 19:23 pts/0 00:00:00 sshd: root@pts/1
root 6054 3208 0 19:24 pts/0 00:00:00 grep sshd
三.制作GW在 daemontools下启动
mkdir -p /app/workdir/XXXGW
cd /app/workdir/XXXGW
mkdir -p log/main
touch log/status
chown logadmin.logadmin log/main
chown logadmin.logadmin log/status
vi /app/shell/runXXX.sh
#!/bin/sh
#---------------------------------------------------------------------------
PATH=/usr/local/jdk/bin:/usr/local/mysql/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/app/bin
export PATH
sleep 3
source /app/shell/setenv.sh
if [ "$OS" = "Windows_NT" ]; then
ARGS=""
else
MIN_MEMORY=2048m
MAX_MEMORY=2048m
MIN_NEW_SIZE=8m
MAX_NEW_SIZE=8m
ARGS="-Xms${MIN_MEMORY} -Xmx${MAX_MEMORY} -XX:PermSize=128M -XX:NewSize=${MIN_NEW_SIZE} -XX:MaxNewSize=${MAX_NEW_SIZE}"
ARGS2="-Djava.util.prefs.systemRoot=/app/XXXGW/samples -Djava.util.prefs.syncInterval=1"
fi
mkdir -p /app/XXXGW/samples/.systemPrefs
exec java -DPROCESS_NAME=XXXGW ${ARGS2} ${ARGS3} -DconfigPath=/app/XXXGW/conf/ ${ARGS} XXXStartUp config.xml 2>&1
启动脚本的链接:
ln -s /app/shell/runXXX.sh /app/workdir/XXXGW/run
vi log/run
#!/bin/sh
exec setuidgid logadmin multilog t s1000000 n100 ./main
chmod 755 log/run
注意log 大小现在是975K
注意确认:
启动时要做的软连接:
ln -s /app/workdir/XXXGW /service/XXX
阅读(3076) | 评论(0) | 转发(0) |