Chinaunix首页 | 论坛 | 博客
  • 博客访问: 166195
  • 博文数量: 20
  • 博客积分: 1610
  • 博客等级: 上尉
  • 技术积分: 230
  • 用 户 组: 普通用户
  • 注册时间: 2007-03-16 15:40
文章分类

全部博文(20)

文章存档

2013年(1)

2012年(2)

2011年(8)

2010年(1)

2009年(2)

2008年(2)

2007年(4)

分类: LINUX

2010-09-08 16:28:16

概要记录

发生时间: 公元2010年08月28日
问题现象: java程序莫名挂起,追查挂起的地方,竟然是一个很不起眼的System.out.println("XXXX");
环境描述: 用 daemontools 构建的"进程不会消失的"启动方式.
         daemontools维护的是一个 java 服务程序,采用脚本启动,并将控制台的日志通过multilog
         转存到日志文件中.

调查过程:

经过调查是在处理log/main 日志的权限是存在不足:
制作新的服务启动脚本时, 因为拷贝 (cp) 回导致权限改变(root),
启动时没有影响,正常运行时log/main/currenct 可以正常写入,
当程序运行一段时间后currenct日志写满了,不能新建日志(目录权限不为logadmin),
此时就很严重了,会造成程序挂起.

一般的程序中总存在多少一点控制台输出,multilog在处理这个信息的时候会阻塞住程序的输出,
直到自己的日志转存工作结束才会释放,但如果转存中出现权限问题,那么悲剧就来了.

日志转存记录脚本:
vi log/run
#!/bin/sh
exec setuidgid logadmin multilog t s1000000 n100 ./main

总结教训:

1,日志的属组千万别改.
2,程序里面少弄点这样的日志.
3,其实最好的就是去掉这个控制台信息转存,如果要了没啥大用的话.



java 堆栈信息和测试程序代码:
jstack  -l 2630
2010-09-08 01:33:55
Full thread dump Java HotSpot(TM) Client VM (14.0-b16 mixed mode, sharing):

"Attach Listener" daemon prio=10 tid=0xb5014c00 nid=0x16c6 waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Low Memory Detector" daemon prio=10 tid=0x08c24400 nid=0xa59 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"CompilerThread0" daemon prio=10 tid=0x08c22800 nid=0xa58 waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Signal Dispatcher" daemon prio=10 tid=0x08c20c00 nid=0xa57 runnable [0x00000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Finalizer" daemon prio=10 tid=0x08c10c00 nid=0xa56 in Object.wait() [0xb5426000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x8be43e70> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x8be43e70> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

   Locked ownable synchronizers:
        - None

"Reference Handler" daemon prio=10 tid=0x08c0f400 nid=0xa55 in Object.wait() [0xb5477000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x8be43ef8> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
        - locked <0x8be43ef8> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
        - None

"main" prio=10 tid=0x08be9800 nid=0xa53 runnable [0xb76af000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(Unknown Source)
        at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
        at java.io.BufferedOutputStream.flush(Unknown Source)
        - locked <0x8be46818> (a java.io.BufferedOutputStream)
        at java.io.PrintStream.write(Unknown Source)
        - locked <0x8be43f00> (a java.io.PrintStream)
        at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
        at sun.nio.cs.StreamEncoder.implFlushBuffer(Unknown Source)
        at sun.nio.cs.StreamEncoder.flushBuffer(Unknown Source)
        - locked <0x8be46858> (a java.io.OutputStreamWriter)
        at java.io.OutputStreamWriter.flushBuffer(Unknown Source)
        at java.io.PrintStream.write(Unknown Source)
        - locked <0x8be43f00> (a java.io.PrintStream)
        at java.io.PrintStream.print(Unknown Source)
        at java.io.PrintStream.println(Unknown Source)
        - locked <0x8be43f00> (a java.io.PrintStream)
        at jp.emcom.cfd.ratechecker.Test.main(Test.java:12)

   Locked ownable synchronizers:
        - None

"VM Thread" prio=10 tid=0x08c0d800 nid=0xa54 runnable

"VM Periodic Task Thread" prio=10 tid=0x08c2e400 nid=0xa5a waiting on condition

JNI global references: 741

 
Test.java


附录:

daemontools + 这个 java 应用的搭建过程:
一,下载安装
wget
wget
wget

二, 制作 sshd 在 daemontools下启动
ls -l /home/src
-rw-r--r-- 1 root root 289280 Jul 13  2001 daemontools-0.76.tar
-rw-r--r-- 1 root root 252264 Jan 27  2004 netqmail-1.05.tar.gz
-rw-r--r-- 1 root root  53019 Mar 18  2000 ucspi-tcp-0.88.tar.gz
安装daemontools
[root@monitor ~]# cd /home/src/
cd /home/src/admin/daemontools-0.76
package/install
daemontools 程序安装完毕

cd /service/
mkdir sshd
chmod +t sshd
mkdir sshd/log
mkdir sshd/log/main
touch sshd/log/status
useradd logadmin
chown logadmin:daemontools sshd/log/main
chown logadmin:daemontools sshd/log/status

[root@monitor ~]#vi /service/sshd/run
#!/bin/sh

PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin
export PATH
sleep 5

tcprules allow.cdb allow.tmp < allow
exec env - PATH=$PATH \
tcpserver -vHRDl0 -x allow.cdb -u 0 0 ssh sshd -i -e 2>&1

[root@monitor ~]# chmod 755 /service/sshd/run
vi /service/sshd/log/run
#!/bin/sh

exec setuidgid logadmin multilog t s1000000 n100 ./main


[root@monitor ~]# chmod 755 /service/sshd/log/run
[root@monitor ~]# find /command -type l |xargs ls -l
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/envdir -> /package/admin/daemontools/command/envdir
lrwxrwxrwx 1 root root 44 Apr 10 22:11 /command/envuidgid -> /package/admin/daemontools/command/envuidgid
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/fghack -> /package/admin/daemontools/command/fghack
lrwxrwxrwx 1 root root 43 Apr 10 22:11 /command/multilog -> /package/admin/daemontools/command/multilog
lrwxrwxrwx 1 root root 43 Apr 10 22:11 /command/pgrphack -> /package/admin/daemontools/command/pgrphack
lrwxrwxrwx 1 root root 48 Apr 10 22:11 /command/readproctitle -> /package/admin/daemontools/command/readproctitle
lrwxrwxrwx 1 root root 42 Apr 10 22:11 /command/setlock -> /package/admin/daemontools/command/setlock
lrwxrwxrwx 1 root root 44 Apr 10 22:11 /command/setuidgid -> /package/admin/daemontools/command/setuidgid
lrwxrwxrwx 1 root root 44 Apr 10 22:11 /command/softlimit -> /package/admin/daemontools/command/softlimit
lrwxrwxrwx 1 root root 44 Apr 10 22:11 /command/supervise -> /package/admin/daemontools/command/supervise
lrwxrwxrwx 1 root root 38 Apr 10 22:11 /command/svc -> /package/admin/daemontools/command/svc
lrwxrwxrwx 1 root root 39 Apr 10 22:11 /command/svok -> /package/admin/daemontools/command/svok
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/svscan -> /package/admin/daemontools/command/svscan
lrwxrwxrwx 1 root root 45 Apr 10 22:11 /command/svscanboot -> /package/admin/daemontools/command/svscanboot
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/svstat -> /package/admin/daemontools/command/svstat
lrwxrwxrwx 1 root root 41 Apr 10 22:11 /command/tai64n -> /package/admin/daemontools/command/tai64n
lrwxrwxrwx 1 root root 46 Apr 10 22:11 /command/tai64nlocal -> /package/admin/daemontools/command/tai64nlocal

[root@monitor ~]# find /usr/local/bin -type l -exec ls -l {} \;
lrwxrwxrwx 1 root root 18 Apr 10 22:11 /usr/local/bin/envuidgid -> /command/envuidgid
lrwxrwxrwx 1 root root 17 Apr 10 22:11 /usr/local/bin/pgrphack -> /command/pgrphack
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/envdir -> /command/envdir
lrwxrwxrwx 1 root root 18 Apr 10 22:11 /usr/local/bin/supervise -> /command/supervise
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/svstat -> /command/svstat
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/svscan -> /command/svscan
lrwxrwxrwx 1 root root 16 Apr 10 22:11 /usr/local/bin/setlock -> /command/setlock
lrwxrwxrwx 1 root root 17 Apr 10 22:11 /usr/local/bin/multilog -> /command/multilog
lrwxrwxrwx 1 root root 18 Apr 10 22:11 /usr/local/bin/setuidgid -> /command/setuidgid
lrwxrwxrwx 1 root root 20 Apr 10 22:11 /usr/local/bin/tai64nlocal -> /command/tai64nlocal
lrwxrwxrwx 1 root root 18 Apr 10 22:11 /usr/local/bin/softlimit -> /command/softlimit
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/tai64n -> /command/tai64n
lrwxrwxrwx 1 root root 13 Apr 10 22:11 /usr/local/bin/svok -> /command/svok
lrwxrwxrwx 1 root root 15 Apr 10 22:11 /usr/local/bin/fghack -> /command/fghack
lrwxrwxrwx 1 root root 19 Apr 10 22:11 /usr/local/bin/svscanboot -> /command/svscanboot
lrwxrwxrwx 1 root root 12 Apr 10 22:11 /usr/local/bin/svc -> /command/svc
lrwxrwxrwx 1 root root 22 Apr 10 22:11 /usr/local/bin/readproctitle -> /command/readproctitle

[root@monitor ~]# find /command -type l |xargs rm
find /usr/local/bin -type l -exec rm {} \;
cp -a /package/admin/daemontools-0.76/command/* /usr/local/bin/
vi /service/sshd/allow
10.4.:allow
127.0.0.1:allow
:deny

[root@monitor ~]# vi /etc/rc.d/init.d/svscan
#!/bin/sh
#
# svscan        Start svscan.
#
# chkconfig: 345 99 50
# description:  svscan starts and monitors a collection of services.
#                see
# processname: svscan
# pidfile: /var/run/svscan.pid
# service directory: /service

# Source function library.
. /etc/rc.d/init.d/functions

export PATH="/usr/local/bin:$PATH"
SVDIR=/service

case "$1" in
  start)
    echo -n "Starting svscan: "
    exec env - PATH="/usr/local/bin:/usr/bin:/bin" SVDIR="$SVDIR" \
      csh -cf 'svscan $SVDIR &; echo $! > /var/run/svscan.pid'
    RETVAL=$?   # Sorry, I don't catch exit code of svscan.
    echo
    [ $RETVAL -eq 0 ] && touch /var/lock/subsys/svscan
    ;;
  stop)
    echo -n "Stopping svscan: "
    killproc svscan
    RETVAL=$?
    svc -dx $SVDIR/* $SVDIR/*/log
    echo
    [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/svscan
    ;;
  status)
    status svscan
    RETVAL=$?
    ;;
  restart|reload)
    $0 stop
    $0 start
    RETVAL=$?
    ;;
  *)
    echo "Usage: $0 {start|stop|status|restart|reload}"
    exit 1
esac

exit $RETVAL

[root@monitor ~]# chmod 755 /etc/rc.d/init.d/svscan
[root@monitor ~]# chkconfig --add svscan
[root@monitor ~]# /etc/rc.d/init.d/svscan start
[root@monitor ~]# vi /etc/inittab
#SV:123456:respawn:/command/svscanboot

[root@monitor ~]# /etc/init.d/sshd stop   (关闭系统的sshd,停止前请后请不要退出终端)
svc -u /service/sshd/  启动daemontools 下的 sshd
chkconfig sshd off
chkconfig --list |grep sshd
sshd            0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@monitor ~]#  ps -ef|grep sshd
root      3206     1  0 18:25 ?        00:00:03 sshd: root@pts/0
root      6010  6009  0 19:23 pts/0    00:00:00 supervise sshd
root      6011  6010  0 19:23 pts/0    00:00:00 tcpserver -vHRDl0 -x allow.cdb -u 0 0 ssh sshd -i -e
root      6016  6011  0 19:23 pts/0    00:00:00 sshd: root@pts/1
root      6054  3208  0 19:24 pts/0    00:00:00 grep sshd


三.制作GW在 daemontools下启动

mkdir  -p /app/workdir/XXXGW
cd /app/workdir/XXXGW
mkdir -p log/main
touch log/status
chown logadmin.logadmin log/main
chown logadmin.logadmin log/status

vi /app/shell/runXXX.sh
#!/bin/sh
#---------------------------------------------------------------------------
PATH=/usr/local/jdk/bin:/usr/local/mysql/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/app/bin
export PATH
sleep 3
source /app/shell/setenv.sh
if [ "$OS" = "Windows_NT" ]; then
        ARGS=""
else
        MIN_MEMORY=2048m
        MAX_MEMORY=2048m
        MIN_NEW_SIZE=8m
        MAX_NEW_SIZE=8m

        ARGS="-Xms${MIN_MEMORY} -Xmx${MAX_MEMORY} -XX:PermSize=128M  -XX:NewSize=${MIN_NEW_SIZE} -XX:MaxNewSize=${MAX_NEW_SIZE}"
        ARGS2="-Djava.util.prefs.systemRoot=/app/XXXGW/samples -Djava.util.prefs.syncInterval=1"


fi

mkdir -p /app/XXXGW/samples/.systemPrefs

exec java -DPROCESS_NAME=XXXGW ${ARGS2} ${ARGS3} -DconfigPath=/app/XXXGW/conf/ ${ARGS} XXXStartUp config.xml 2>&1

启动脚本的链接:
ln -s  /app/shell/runXXX.sh /app/workdir/XXXGW/run

vi log/run
#!/bin/sh

exec setuidgid logadmin multilog t s1000000 n100 ./main

chmod 755 log/run
注意log 大小现在是975K

注意确认:
启动时要做的软连接:
ln -s /app/workdir/XXXGW /service/XXX







阅读(3053) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~